scispace - formally typeset
Search or ask a question

Showing papers on "Mixed model published in 2009"


Journal ArticleDOI
TL;DR: In this article, the authors propose a 3.0003-3472/$38.0/$39.0/€39.00/€40.00/$41.00

794 citations


Journal ArticleDOI
TL;DR: It is shown that random slope models have the potential to reduce residual variance by accounting for between-individual variation in slopes, which makes it easier to detect treatment effects that are applied between individuals, hence reducing type II errors as well.
Abstract: Mixed-effect models are frequently used to control for the nonindependence of data points, for example, when repeated measures from the same individuals are available. The aim of these models is often to estimate fixed effects and to test their significance. This is usually done by including random intercepts, that is, intercepts that are allowed to vary between individuals. The widespread belief is that this controls for all types of pseudoreplication within individuals. Here we show that this is not the case, if the aim is to estimate effects that vary within individuals and individuals differ in their response to these effects. In these cases, random intercept models give overconfident estimates leading to conclusions that are not supported by the data. By allowing individuals to differ in the slopes of their responses, it is possible to account for the nonindependence of data points that pseudoreplicate slope information. Such random slope models give appropriate standard errors and are easily implemented in standard statistical software. Because random slope models are not always used where they are essential, we suspect that many published findings have too narrow confidence intervals and a substantially inflated type I error rate. Besides reducing type I errors, random slope models have the potential to reduce residual variance by accounting for between-individual variation in slopes, which makes it easier to detect treatment effects that are applied between individuals, hence reducing type II errors as well.

744 citations


Journal ArticleDOI
TL;DR: The assumptions that underlie these approaches to assessing covariate effects on the mean of a continuous, dichotomous or count outcome for analyses of longitudinal repeated-measures data are examined.
Abstract: For analyses of longitudinal repeated-measures data, statistical methods include the random effects model, fixed effects model and the method of generalized estimating equations. We examine the assumptions that underlie these approaches to assessing covariate effects on the mean of a continuous, dichotomous or count outcome. Access to statistical software to implement these models has led to widespread application in numerous disciplines. However, careful consideration should be paid to their critical assumptions to ascertain which model might be appropriate in a given setting. To illustrate similarities and differences that might exist in empirical results, we use a study that assessed depressive symptoms in low-income pregnant women using a structured instrument with up to five assessments that spanned the pre-natal and post-natal periods. Understanding the conceptual differences between the methods is important in their proper application even though empirically they might not differ substantively. The choice of model in specific applications would depend on the relevant questions being addressed, which in turn informs the type of design and data collection that would be relevant.

374 citations


Book ChapterDOI
01 Jan 2009
TL;DR: In this chapter, the authors continue with Gaussian linear and additive mixed modelling methods and discuss their application on nested data.
Abstract: In this chapter, we continue with Gaussian linear and additive mixed modelling methods and discuss their application on nested data. Nested data is also referred to as hierarchical data or multilevel data in other scientific fields (Snijders and Boskers, 1999; Raudenbush and Bryk, 2002).

214 citations


Journal ArticleDOI
01 Aug 2009-Genomics
TL;DR: The method presented is more accurate, powerful and flexible than the traditional methods for analysis of qRT-PCR data and allows testing of a broader class of hypotheses than traditional analyses such as the classical comparative C(T).

202 citations



Journal ArticleDOI
TL;DR: This work discusses prediction of random effects and of expected responses in multilevel generalized linear models and presents approximations and suggests using parametric bootstrapping to obtain standard errors.
Abstract: We discuss prediction of random effects and of expected responses in multilevel generalized linear models. Prediction of random effects is useful for instance in small area estimation and disease mapping, effectiveness studies and model diagnostics. Prediction of expected responses is useful for planning, model interpretation and diagnostics. For prediction of random effects, we concentrate on empirical Bayes prediction and discuss three different kinds of standard errors; the posterior standard deviation and the marginal prediction error standard deviation (comparative standard errors) and the marginal sampling standard deviation (diagnostic standard error). Analytical expressions are available only for linear models and are provided in an appendix. For other multilevel generalized linear models we present approximations and suggest using parametric bootstrapping to obtain standard errors. We also discuss prediction of expectations of responses or probabilities for a new unit in a hypothetical cluster, or in a new (randomly sampled) cluster or in an existing cluster. The methods are implemented in gllamm and illustrated by applying them to survey data on reading proficiency of children nested in schools. Simulations are used to assess the performance of various predictions and associated standard errors for logistic random-intercept models under a range of conditions. © 2009 Royal Statistical Society.

199 citations


Journal ArticleDOI
TL;DR: A comparison of different weighting methods in the analysis of four typical series of plant breeding trials using mixed models with fixed or random genetic effects found that the two-stage analysis gave acceptable results with fixed genetic effects.
Abstract: Series of plant breeding trials are often unbalanced and have a complex genetic structure. To reduce computing cost, it is common practice to employ a two-stage approach, where adjusted means per location are estimated and then a mixed model analysis of these adjusted means is performed. An important question is how means from the first step should be weighted in the second step. Our objective therefore was the comparison of different weighting methods in the analysis of four typical series of plant breeding trials using mixed models with fixed or random genetic effects. We used four published weighting methods and proposed three new methods. Four evaluation criteria were computed to compare methods, using one-stage analysis as benchmark. We found that the two-stage analysis gave acceptable results with fixed genetic effects. When genetic effects were taken as random in stage two, in three of four datasets the two-stage analysis gave acceptable results. In both cases differences between weighting methods were small and the best weighting method depended on the dataset but not on the evaluation criteria. A two-stage analysis without weighting also produced acceptable results, but weighting mostly performed better. In the fourth dataset the missing data pattern was informative, resulting in violation of the missing-at-random (MAR) assumption in one- and two-stage analysis. In this case both analyses were not strictly valid.

190 citations


Journal ArticleDOI
TL;DR: In this paper, a stochastic integral projection model (IPM) is proposed to capture the interannual variability in survival, growth rate, and fecundity of plants.
Abstract: Most plant and animal populations have substantial interannual variability in survival, growth rate, and fecundity. They also exhibit substantial variability among individuals in traits such as size, age, condition, and disease status that have large impacts on individual fates and consequently on the future of the population. We present here methods for constructing and analyzing a stochastic integral projection model (IPM) incorporating both of these forms of variability, illustrated through a case study of the monocarpic thistle Carlina vulgaris. We show how model construction can exploit the close correspondence between stochastic IPMs and statistical analysis of trait-fate relationships in a ''mixed'' or ''hierarchical'' models framework. This correspondence means that IPMs can be parameter- ized straightforwardly from data using established statistical techniques and software (vs. the largely ad hoc methods for stochastic matrix models), properly accounting for sampling error and between-year sample size variation and with vastly fewer parameters than a conventional stochastic matrix model. We show that the many tools available for analyzing stochastic matrix models (such as stochastic growth rate, kS, small variance approximations, elasticity/sensitivity analysis, and life table response experiment (LTRE) analysis) can be used for IPMs, and we give computational formulas for elasticity/sensitivity analyses. We develop evolutionary analyses based on the connection between growth rate sensitivity and selection gradients and present a new method using techniques from functional data analysis to study the evolution of function-valued traits such as size-dependent flowering probability. For Carlina we found consistent selection against variability in both state-specific transition rates and the fitted functions describing state dependence in demographic rates. For most of the regression parameters defining the IPM there was also selection against temporal variance; however, in some cases the effects of nonlinear averaging were big enough to favor increased temporal variation. The LTRE analysis identified year-to-year variation in survival as the dominant factor in population growth variability. Evolutionary analysis of flowering strategy showed that the entire functional relationship between plant size and flowering probability is at or near an evolutionarily stable strategy (ESS) shaped by the size-specific trade-off between the benefit (fecundity) and cost (mortality) of flowering in a temporally varying environment.

158 citations


Journal ArticleDOI
TL;DR: It is shown that GMFLMs are, in fact, generalized multilevel mixed models, which can be analyzed using the mixed effects inferential machinery and can be generalized within a well-researched statistical framework.
Abstract: We introduce Generalized Multilevel Functional Linear Models (GMFLMs), a novel statistical framework for regression models where exposure has a multilevel functional structure We show that GMFLMs are, in fact, generalized multilevel mixed models Thus, GMFLMs can be analyzed using the mixed effects inferential machinery and can be generalized within a well-researched statistical framework We propose and compare two methods for inference: (1) a two-stage frequentist approach; and (2) a joint Bayesian analysis Our methods are motivated by and applied to the Sleep Heart Health Study, the largest community cohort study of sleep However, our methods are general and easy to apply to a wide spectrum of emerging biological and medical datasets Supplemental materials for this article are available online

148 citations


Journal ArticleDOI
TL;DR: A formula is proposed to estimate the sample size required to detect an interaction between two binary variables in a factorial design with repeated measures of a continuous outcome, based on the fact that the variance of an interaction is fourfold that of the main effect.

Journal ArticleDOI
TL;DR: Factor analytic models provide a natural framework for modelling genotype × environment interaction type problems and are likely to see increasing use due to the parsimonious description of covariance structures available, the scope for direct interpretation of factors as well as computational advantages.
Abstract: Background: Analysis of data on genotypes with different expression in different environments is a classic problem in quantitative genetics A review of models for data with genotype × environment interactions and related problems is given, linking early, analysis of variance based formulations to their modern, mixed model counterparts Results: It is shown that models developed for the analysis of multi-environment trials in plant breeding are directly applicable in animal breeding In particular, the 'additive main effect, multiplicative interaction' models accommodate heterogeneity of variance and are characterised by a factor-analytic covariance structure While this can be implemented in mixed models by imposing such structure on the genetic covariance matrix in a standard, multi-trait model, an equivalent model is obtained by fitting the common and specific factors genetic separately Properties of the mixed model equations for alternative implementations of factor-analytic models are discussed, and extensions to structured modelling of covariance matrices for multi-trait, multi-environment scenarios are described Conclusion: Factor analytic models provide a natural framework for modelling genotype × environment interaction type problems Mixed model analyses fitting such models are likely to see increasing use due to the parsimonious description of covariance structures available, the scope for direct interpretation of factors as well as computational advantages

Journal ArticleDOI
TL;DR: In this article, the authors extend the joint ranks estimators to estimating the fixed effects in a linear model with cluster correlated continuous error distributions for general score functions, and discuss the asymptotic theory of the estimators and standard errors of estimators.
Abstract: R estimators based on the joint ranks (JR) of all the residuals have been developed over the last 20 years for fitting linear models with independently distributed errors. In this article, we extend these estimators to estimating the fixed effects in a linear model with cluster correlated continuous error distributions for general score functions. We discuss the asymptotic theory of the estimators and standard errors of the estimators. For the related mixed model with a single random effect, we discuss robust estimators of the variance components. These are used to obtain Studentized residuals for the JR fit. A real example is discussed, which illustrates the efficiency of the JR analysis over the traditional analysis and the efficiency of a prudent choice of a score function. Simulation studies over situations similar to the example confirm the validity and efficiency of the analysis.

Journal ArticleDOI
TL;DR: It is generally worth thoroughly exploring different data transformations before embarking on a more complex mixed model analysis, when the error structure is complex.
Abstract: Mixed model packages can be used to analyze designed experiments with variance structures that allow for heterogeneity of variance between treatments. Such analyses are useful, when the error structure is complex. Alternatively, when a simple data transformation is found that stabilizes the variance, a standard ANOVA can be performed. Such a simple analysis has several practical advantages, including efficient use of error degrees of freedom and the facility to produce letter displays with a constant critical difference when data are balanced. It is therefore generally worth thoroughly exploring different data transformations before embarking on a more complex mixed model analysis.

Journal ArticleDOI
TL;DR: An advanced nonlinear mixed-effects (NLME) model is presented for modeling and predicting degradation in nuclear piping system and offers considerable improvement by reducing the variance associated with degradation of a specific unit, which leads to more realistic estimates of risk.

Journal ArticleDOI
TL;DR: It is shown that bias can be induced for regression coefficients when random effects are truly correlated but misspecified as independent in a 2-part mixed model.
Abstract: Semicontinuous data in the form of a mixture of zeros and continuously distributed positive values frequently arise in biomedical research. Two-part mixed models with correlated random effects are an attractive approach to characterize the complex structure of longitudinal semicontinuous data. In practice, however, an independence assumption about random effects in these models may often be made for convenience and computational feasibility. In this article, we show that bias can be induced for regression coefficients when random effects are truly correlated but misspecified as independent in a 2-part mixed model. Paralleling work on bias under nonignorable missingness within a shared parameter model, we derive and investigate the asymptotic bias in selected settings for misspecified 2-part mixed models. The performance of these models in practice is further evaluated using Monte Carlo simulations. Additionally, the potential bias is investigated when artificial zeros, due to left censoring from some detection or measuring limit, are incorporated. To illustrate, we fit different 2-part mixed models to the data from the University of Toronto Psoriatic Arthritis Clinic, the aim being to examine whether there are differential effects of disease activity and damage on physical functioning as measured by the health assessment questionnaire scores over the course of psoriatic arthritis. Some practical issues on variance component estimation revealed through this data analysis are considered.

Journal ArticleDOI
TL;DR: In this article, a meta-analysis on influencing factors controlling the constant decay rate of coarse woody debris was set up, based on an intensive literature research a nonlinear mixed effects model was constructed.

Journal ArticleDOI
TL;DR: In this paper, a new hidden Markov model for the space-time evolution of daily rainfall is developed which models precipitation within hidden regional weather types by censored power-transformed Gaussian distributions.
Abstract: Summary. A new hidden Markov model for the space–time evolution of daily rainfall is developed which models precipitation within hidden regional weather types by censored power-transformed Gaussian distributions. The latter provide flexible and interpretable multivariate models for the mixed discrete–continuous variables that describe both precipitation, when it occurs, and no precipitation. Parameter estimation is performed by using a Monte Carlo EM algorithm whose use and performance are evaluated by simulation studies. The model is fitted to rainfall data from a small network of stations in New Zealand encompassing a diverse range of orographic effects. The results that are obtained show that the marginal distributions and spatial structure of the data are well described by the fitted model which provides a better description of the spatial structure of precipitation than a standard hidden Markov model that is commonly used in the literature. However, the fitted model, like the standard hidden Markov model, cannot fully reproduce the local dynamics and underestimates the lag 1 auto-correlations.

Journal ArticleDOI
TL;DR: A mixed model based on the convenience rate law and the Michaelis-Menten equation is the most suitable deterministic modeling approach followed by a reversible generalized mass action kinetics model.
Abstract: To understand the dynamic behavior of cellular systems, mathematical modeling is often necessary and comprises three steps: (1) experimental measurement of participating molecules, (2) assignment of rate laws to each reaction, and (3) parameter calibration with respect to the measurements. In each of these steps the modeler is confronted with a plethora of alternative approaches, e. g., the selection of approximative rate laws in step two as specific equations are often unknown, or the choice of an estimation procedure with its specific settings in step three. This overall process with its numerous choices and the mutual influence between them makes it hard to single out the best modeling approach for a given problem. We investigate the modeling process using multiple kinetic equations together with various parameter optimization methods for a well-characterized example network, the biosynthesis of valine and leucine in C. glutamicum. For this purpose, we derive seven dynamic models based on generalized mass action, Michaelis-Menten and convenience kinetics as well as the stochastic Langevin equation. In addition, we introduce two modeling approaches for feedback inhibition to the mass action kinetics. The parameters of each model are estimated using eight optimization strategies. To determine the most promising modeling approaches together with the best optimization algorithms, we carry out a two-step benchmark: (1) coarse-grained comparison of the algorithms on all models and (2) fine-grained tuning of the best optimization algorithms and models. To analyze the space of the best parameters found for each model, we apply clustering, variance, and correlation analysis. A mixed model based on the convenience rate law and the Michaelis-Menten equation, in which all reactions are assumed to be reversible, is the most suitable deterministic modeling approach followed by a reversible generalized mass action kinetics model. A Langevin model is advisable to take stochastic effects into account. To estimate the model parameters, three algorithms are particularly useful: For first attempts the settings-free Tribes algorithm yields valuable results. Particle swarm optimization and differential evolution provide significantly better results with appropriate settings.

Journal ArticleDOI
TL;DR: In this article, the authors compared the accuracy of various statistical models, including least squares multiple linear regression, generalized additive modeling, ordinary kriging, and linear mixed modeling (LMM), for estimating stream temperatures in Michigan and Wisconsin.
Abstract: Estimating stream temperatures across broad spatial extents is important for regional conservation of running waters. Although statistical models can be useful in this endeavor, little information exists to aid in the selection of a particular statistical approach. Our objective was to compare the accuracy of ordinary least-squares multiple linear regression, generalized additive modeling, ordinary kriging, and linear mixed modeling (LMM) using July mean stream temperatures in Michigan and Wisconsin. Although LMM using low-rank thin-plate smoothing splines to measure the spatial autocorrelation in stream temperatures was the most accurate modeling approach; overall, there were only slight differences in prediction accuracy among the evaluated approaches. This suggests that managers and researchers can select a stream temperature modeling approach that meets their level of expertise without sacrificing substantial amounts of prediction accuracy. The most accurate models for Michigan and Wisconsin had root mean square errors of 2.0-2.3°C, suggesting that only relatively coarse predictions can be produced from landscape-based statistical models at regional scales. Explaining substantially more variability in stream temperatures likely will require the collection of finer-scale hydrologic and physiographic data, which may be cost prohibitive for monitoring and assessing stream temperatures at regional scales.

DOI
15 Oct 2009
TL;DR: The authors describe generalized linear latent and mixed models (GLLAMMs) and illustrate their potential in epidemiology and demonstrate their utility in three applications involving repeated measurements, measurement error and multilevel data.
Abstract: We describe generalized linear latent and mixed models (GLLAMMs) and illustrate their potential in epidemiology. GLLAMMs include many types of multilevel random effect, factor and structural equation models. A wide range of response types are accommodated including continuous, dichotomous, ordinal and nominal responses as well as counts and survival times. Multivariate responses can furthermore be of mixed types. The utility of GLLAMMs is illustrated in three applications involving repeated measurements, measurement error and multilevel data.

Journal ArticleDOI
TL;DR: A CCC for longitudinal repeated measurements is developed through the appropriate specification of the intraclass correlation coefficient from a variance components linear mixed model.
Abstract: The concordance correlation coefficient (CCC) is an index that is commonly used to assess the degree of agreement between observers on measuring a continuous characteristic. Here, a CCC for longitudinal repeated measurements is developed through the appropriate specification of the intraclass correlation coefficient from a variance components linear mixed model. A case example and the results of a simulation study are provided.

Journal ArticleDOI
TL;DR: In this paper, the authors compare mixed models and generalized estimating equations (GEE) with illustrative examples from a 3-year study of mallard (Anas platyrhynchos) nest structures.
Abstract: Summary 1. Statistical methods that assume independence among observations result in optimistic estimates of uncertainty when applied to correlated data, which are ubiquitous in applied ecological research. Mixed effects models offer a potential solution and rely on the assumption that latent or unobserved characteristics of individuals (i.e. random effects) induce correlation among repeated measurements. However, careful consideration must be given to the interpretation of parameters when using a nonlinear link function (e.g. logit). Mixed model regression parameters reflect the change in the expected response within an individual associated with a change in that individual’s covariates [i.e. a subject-specific (SS) interpretation], which may not address a relevant scientific question. In particular, a SS interpretation is not natural for covariates that do not vary within individuals (e.g. gender). 2. An alternative approach combines the solution to an unbiased estimating equation with robust measures of uncertainty to make inferences regarding predictor–outcome relationships. Regression parameters describe changes in the average response among groups of individuals differing in their covariates [i.e. a population-averaged (PA) interpretation]. 3. We compare these two approaches [mixed models and generalized estimating equations (GEE)] with illustrative examples from a 3-year study of mallard (Anas platyrhynchos) nest structures. We observe that PA and SS responses differ when modelling binary data, with PA parameters behaving like attenuated versions of SS parameters. Differences between SS and PA parameters increase with the size of among-subject heterogeneity captured by the random effects variance component. Lastly, we illustrate how PA inferences can be derived (post hoc) from fitted generalized and nonlinear-mixed models. 4. Synthesis and applications. Mixed effects models and GEE offer two viable approaches to modelling correlated data. The preferred method should depend primarily on the research question (i.e. desired parameter interpretation), although operating characteristics of the associated estimation procedures should also be considered. Many applied questions in ecology, wildlife management and conservation biology (including the current illustrative examples) focus on population performance measures (e.g. mean survival or nest success rates) as a function of general landscape features, for which the PA model interpretation, not the more commonly used SS model interpretation may be more natural.

Journal ArticleDOI
TL;DR: Key elements of association analysis with mixed models are reviewed, including modeling phenotype-genotype associations using mixed models, population stratification, kinship and its estimation, variance component estimation, use of best linear unbiased predictors or residuals in place of raw phenotype, improving efficiency and software-user interaction.
Abstract: Mixed models improve the ability to detect phenotype-genotype associations in the presence of population stratification and multiple levels of relatedness in genome-wide association studies (GWAS), but for large data sets the resource consumption becomes impractical. At the same time, the sample size and number of markers used for GWAS is increasing dramatically, resulting in greater statistical power to detect those associations. The use of mixed models with increasingly large data sets depends on the availability of software for analyzing those models. While multiple software packages implement the mixed model method, no single package provides the best combination of fast computation, ability to handle large samples, flexible modeling and ease of use. Key elements of association analysis with mixed models are reviewed, including modeling phenotype-genotype associations using mixed models, population stratification, kinship and its estimation, variance component estimation, use of best linear unbiased predictors or residuals in place of raw phenotype, improving efficiency and software-user interaction. The available software packages are evaluated, and suggestions made for future software development.

Journal ArticleDOI
TL;DR: In this paper, the authors utilize skew-normal/independent distributions as a tool for robust modeling of linear mixed models under a Bayesian paradigm, which is an attractive class of asymmetric heavy-tailed distributions.

Journal ArticleDOI
TL;DR: This article extends the standard logistic mixed model by adding a subject-level random effect to the within-subject variance specification, and permits subjects to have influence on the mean, or location, and variability, or (square of the) scale, of their responses.
Abstract: Mixed-effects logistic regression models are described for analysis of longitudinal ordinal outcomes, where observations are observed clustered within subjects. Random effects are included in the model to account for the correlation of the clustered observations. Typically, the error variance and the variance of the random effects are considered to be homogeneous. These variance terms characterize the within-subjects (i.e., error variance) and between-subjects (i.e., random-effects variance) variation in the data. In this article, we describe how covariates can influence these variances, and also extend the standard logistic mixed model by adding a subject-level random effect to the within-subject variance specification. This permits subjects to have influence on the mean, or location, and variability, or (square of the) scale, of their responses. Additionally, we allow the random effects to be correlated. We illustrate application of these models for ordinal data using Ecological Momentary Assessment (EMA) data, or intensive longitudinal data, from an adolescent smoking study. These mixed-effects ordinal location scale models have useful applications in mental health research where outcomes are often ordinal and there is interest in subject heterogeneity, both between- and within-subjects.

Journal ArticleDOI
TL;DR: The proposed approach is shown to have substantially less bias than the regression calibration approach proposed by Ye et al. (2008) and can be implemented with standard statistical software and does not require complex estimation techniques.
Abstract: Ye, Lin, and Taylor (2008, Biometrics 64, 1238-1246) proposed a joint model for longitudinal measurements and time-to-event data in which the longitudinal measurements are modeled with a semiparametric mixed model to allow for the complex patterns in longitudinal biomarker data. They proposed a two-stage regression calibration approach that is simpler to implement than a joint modeling approach. In the first stage of their approach, the mixed model is fit without regard to the time-to-event data. In the second stage, the posterior expectation of an individual's random effects from the mixed-model are included as covariates in a Cox model. Although Ye et al. (2008) acknowledged that their regression calibration approach may cause a bias due to the problem of informative dropout and measurement error, they argued that the bias is small relative to alternative methods. In this article, we show that this bias may be substantial. We show how to alleviate much of this bias with an alternative regression calibration approach that can be applied for both discrete and continuous time-to-event data. Through simulations, the proposed approach is shown to have substantially less bias than the regression calibration approach proposed by Ye et al. (2008). In agreement with the methodology proposed by Ye et al. (2008), an advantage of our proposed approach over joint modeling is that it can be implemented with standard statistical software and does not require complex estimation techniques.

Journal ArticleDOI
TL;DR: The authors used three forest plots with different spatial patterns of tree locations (i.e., clustered, random, and regular patterns) to investigate the spatial distributions and heterogeneity in the model residuals from six regression models with the ordinary least squares (OLS) as the benchmark.
Abstract: Spatial effects include spatial autocorrelation and heterogeneity. Ignoring spatial effects in a mod- eling process causes misleading significance tests and suboptimal model prediction. In this study, we used three forest plots with different spatial patterns of tree locations (i.e., clustered, random, and regular patterns) to investigate the spatial distributions and heterogeneity in the model residuals from six regression models with the ordinary least squares (OLS) as the benchmark. Our results revealed that when significant spatial autocorrela- tions and variations existed in the relationship between tree height and diameter, as in the softwood plot (clustered) and hardwood plot (random), OLS was not appropriate for modeling the relationship between tree variables. Spatial regression models (i.e., spatial lag and spatial error models) were effective for accounting for spatial autocorrelation in the model residuals, but they were insufficient to deal with the problem of spatial heterogeneity. It was evident that the model residuals in both spatial lag and spatial error models had a similar pattern and magnitudes of spatial heterogeneity at spatial scales different from those of the OLS model. In contrast, the linear mixed model and geographically weighted regression incorporated the spatial dependence and variation into modeling processes, and consequently, fitted the data better and predicted the response variable more accurately. The model residuals from both the linear mixed model and geographically weighted regression had desirable spatial distributions, meaning fewer clusters of similar or dissimilar model residuals over space. FOR .S CI. 55(6):533-548.

Journal ArticleDOI
TL;DR: In this paper, a parametric approach to study the probability distribution of rainfall data at scales of hydrologic interest (e.g. from few minutes up to daily) requires the use of mixed distributions with a discrete part accounting for the occurrence of rain and a continuous one for the rainfall amount.
Abstract: A comprehensive parametric approach to study the probability distribution of rainfall data at scales of hydrologic interest (e.g. from few minutes up to daily) requires the use of mixed distributions with a discrete part accounting for the occurrence of rain and a continuous one for the rainfall amount. In particular, when a bivariate vector (X, Y) is considered (e.g. simultaneous observations from two rainfall stations or from two instruments such as radar and rain gauge), it is necessary to resort to a bivariate mixed model. A quite flexible mixed distribution can be defined by using a 2-copula and four marginals, obtaining a bivariate copula-based mixed model. Such a distribution is able to correctly describe the intermittent nature of rainfall and the dependence structure of the variables. Furthermore, without loss of generality and with gain of parsimony this model can be simplified by some transformations of the marginals. The main goals of this work are: (1) to empirically explore the behaviour of the parameters of marginal transformations as a function of time scale and inter-gauge distance, by analysing data from a network of rain gauges; (2) to compare the properties of the regression curves associated to the copula-based mixed model with those derived from the model simplified by transformations of the marginals. The results from the investigation of transformations’ parameters are in agreement with the expected theoretical dependence on inter-gauge distance, and show dependence on time scale. The analysis on the regression curves points out that: (1) a copula-based mixed model involves regression curves quite close to some non-parametric models; (2) the performance of the parametric regression decreases in the same cases in which non-parametric regression shows some instability; (3) the copula-based mixed model and its simplified version show similar behaviour in term of regression for mid-low values of rainfall.

Journal ArticleDOI
TL;DR: This work proposes a joint model for analysis of longitudinal measurements and competing risks failure time data which is robust in the presence of outlying longitudinal observations during follow-up, and derives an EM algorithm for the maximum likelihood estimates of the parameters and estimate their standard errors using a profile likelihood method.
Abstract: Existing methods for joint modeling of longitudinal measurements and survival data can be highly influenced by outliers in the longitudinal outcome. We propose a joint model for analysis of longitudinal measurements and competing risks failure time data which is robust in the presence of outlying longitudinal observations during follow-up. Our model consists of a linear mixed effects sub-model for the longitudinal outcome and a proportional cause-specific hazards frailty sub-model for the competing risks data, linked together by latent random effects. Instead of the usual normality assumption for measurement errors in the linear mixed effects sub-model, we adopt a t-distribution which has a longer tail and thus is more robust to outliers. We derive an EM algorithm for the maximum likelihood estimates of the parameters and estimate their standard errors using a profile likelihood method. The proposed method is evaluated by simulation studies and is applied to a scleroderma lung study.