scispace - formally typeset
Search or ask a question

Showing papers on "Mixed model published in 2003"


Journal ArticleDOI

1,020 citations


Journal ArticleDOI
Ronghui Xu1
TL;DR: The well-known R(2) measure for linear regression to linear mixed effects models is generalized, to compare two different versions of informed consent document, and the performance of the measures through Monte Carlo simulations is studied.
Abstract: We generalize the well-known R2 measure for linear regression to linear mixed effects models. Our work was motivated by a cluster-randomized study conducted by the Eastern Cooperative Oncology Group, to compare two different versions of informed consent document. We quantify the variation in the response that is explained by the covariates under the linear mixed model, and study three types of measures to estimate such quantities. The first type of measures make direct use of the estimated variances; the second type of measures use residual sums of squares in analogy to the linear regression; the third type of measures are based on the Kullback–Leibler information gain. All the measures can be easily obtained from software programs that fit linear mixed models. We study the performance of the measures through Monte Carlo simulations, and illustrate the usefulness of the measures on data sets. Copyright © 2003 John Wiley & Sons, Ltd.

538 citations


Journal ArticleDOI
TL;DR: The utility of the bounded cumulative hazard model in cure rate estimation is considered, which is an appealing alternative to the widely used two-component mixture model, and is particularly suitable for semiparametric and Bayesian methods of statistical inference.
Abstract: This article considers the utility of the bounded cumulative hazard model in cure rate estimation, which is an appealing alternative to the widely used two-component mixture model. This approach has the following distinct advantages: (1) It allows for a natural way to extend the proportional hazards regression model, leading to a wide class of extended hazard regression models. (2) In some settings the model can be interpreted in terms of biologically meaningful parameters. (3) The model structure is particularly suitable for semiparametric and Bayesian methods of statistical inference. Notwithstanding the fact that the model has been around for less than a decade, a large body of theoretical results and applications has been reported to date. This review article is intended to give a big picture of these modeling techniques and associated statistical problems. These issues are discussed in the context of survival data in cancer.

335 citations


Journal ArticleDOI
Matt P. Wand1
TL;DR: This work has shown that smoothing methods that use basis functions with penalisation can be formulated as maximum likelihood estimators and best predictors in a mixed model framework.
Abstract: Smoothing methods that use basis functions with penalisation can be formulated as maximum likelihood estimators and best predictors in a mixed model framework Such connections are at least a quarter of a century old but, perhaps with the advent of mixed model software, have led to a paradigm shift in the field of smoothing The reason is that most, perhaps all, models involving smoothing can be expressed as a mixed model and hence enjoy the benefit of the growing body of methodology and software for general mixed model analysis The handling of other complications such as clustering, missing data and measurement error is generally quite straightforward with mixed model representations of smoothing

325 citations


Journal ArticleDOI
TL;DR: It is shown how a variety of IRT models can be formulated as particular instances of nonlinear mixed models, and the unifying framework offers the advantage that relations between different I RT models become explicit and that it is rather straightforward to see how existing IRT Models can be adapted and extended.
Abstract: Mixed models take the dependency between observations based on the same cluster into account by introducing 1 or more random effects Common item response theory (IRT) models introduce latent person variables to model the dependence between responses of the same participant Assuming a distribution for the latent variables, these IRT models are formally equivalent with nonlinear mixed models It is shown how a variety of IRT models can be formulated as particular instances of nonlinear mixed models The unifying framework offers the advantage that relations between different IRT models become explicit and that it is rather straightforward to see how existing IRT models can be adapted and extended The approach is illustrated with a self-report study on anger

228 citations


Journal ArticleDOI
TL;DR: Comparisons based on Bayes factors and related criteria favour models with a genetically structured residual variance heterogeneity, and there is strong evidence of a negative correlation between the additive genetic values affecting litter size and those affecting residual variance.
Abstract: Normal mixed models with different levels of heterogeneity in the residual variance are fitted to pig litter size data. Exploratory analysis and model assessment is based on examination of various posterior predictive distributions. Comparisons based on Bayes factors and related criteria favour models with a genetically structured residual variance heterogeneity. There is, moreover, strong evidence of a negative correlation between the additive genetic values affecting litter size and those affecting residual variance. The models are also compared according to the purposes for which they might be used, such as prediction of ‘future’ data, inference about response to selection and ranking candidates for selection. A brief discussion is given of some implications for selection of the genetically structured residual variance model.

151 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed and evaluated three Bayesian multivariate meta-analysis models: two multivariate analogues of the traditional univariate random effects models which make different assumptions about the relationships between studies and estimates, and a multivariate random effect model which is a Bayesian adaptation of the mixed model approach, and illustrated through an analysis of a new data set on parental smoking and two health outcomes (asthma and lower respiratory disease) in children.
Abstract: Meta-analysis is now a standard statistical tool for assessing the overall strength and interesting features of a relationship, on the basis of multiple independent studies. There is, however, recent acknowledgement of the fact that in many applications responses are rarely uniquely determined. Hence there has been some change of focus from a single response to the analysis of multiple outcomes. In this paper we propose and evaluate three Bayesian multivariate meta-analysis models: two multivariate analogues of the traditional univariate random effects models which make different assumptions about the relationships between studies and estimates, and a multivariate random effects model which is a Bayesian adaptation of the mixed model approach. Our preferred method is then illustrated through an analysis of a new data set on parental smoking and two health outcomes (asthma and lower respiratory disease) in children.

140 citations


Journal ArticleDOI
TL;DR: It is shown that this approach to clustering can be extended to analyse data with mixed categorical and continuous attributes and where some of the data are missing at random in the sense of Little and Rubin (Statistical Analysis with Mixing Data).

107 citations


Journal ArticleDOI
TL;DR: In this paper, the authors developed a dynamic height growth model by using a simplified form of mixed effects modeling and subalpine fir (Abies lasiocarpa [Hook.] Nutt.) stem analysis data.
Abstract: I developed a dynamic height growth model by using a simplified form of mixed effects modeling and subalpine fir (Abies lasiocarpa [Hook.] Nutt.) stem analysis data. The new dynamic equation uses directly heights at any age to predict consistent heights (e.g., y 2 = f(t 2 ,t 1 ,y 1 ) ⇔ y 1 = f(t 1 ,t 2 ,y 2 )) and f(t 3 ,t 1 ,y 1 ) = f(t 3 ,t 2 ,f(t 2 ,t 1 ,y 1 ))), and therefore constitutes compatible site indexand height models in one common equation. The parametersforthe model were estimated by analysis of fixed and random effects with corrections for first- and second-order serial autocorrelation. The correction for second-order autocorrelation was necessary to assure the model's proper representation of the data and to remove a seeming cross-sectional autocorrelation across different sites/series. Estimating the errors in site indices as random effects eliminated the effects of stochastic predictive variables. The proposed model has outperformed all other base-age specific and base-age invariant models in both the fit to the data and in its behavior during extrapolations. It also outperforms the model (developed on amabilis fir data) that is currently, operationally used for subalpine fir. The new model's advantages are parsimony, mathematical tractability, base-age invariance, and greater consistency in curvatures of the generated height-age trajectories. FOR. Sci. 49(4):539-554.

92 citations


Journal ArticleDOI
TL;DR: This work reanalyzes the data of Lam et al. using PROC MIXED in SAS and shows how to obtain the parameter estimates of interest with just a few lines of code, and extends the method to settings where the replicate measurements are not linked.
Abstract: The need to assess correlation in settings where multiple measurements are available on each of the variables of interest often arises in environmental science. However, this topic is not covered in introductory statistics texts. Although several ad hoc approaches can be used, they can easily lead to invalid conclusions and to a difficult choice of an appropriate measure of the correlation. Lam et al. approached this problem by using maximum likelihood estimation in cases where the replicate measurements are linked over time, but the method requires specialized software. We reanalyze the data of Lam et al. using PROC MIXED in SAS and show how to obtain the parameter estimates of interest with just a few lines of code. We then extend Lam et al.'s method to settings where the replicate measurements are not linked. Analysis of the unlinked case is illustrated with data from a study designed to assess correlations between indoor and outdoor measurements of benzene concentration in the air.

90 citations


Journal ArticleDOI
TL;DR: The mixed model described a significant relationship between training and performance both for individuals and for groups of swimmers, different over the 3 yr.
Abstract: The training-performance relationship is particularly important for elite sports coaches who search for reproducible phenomena useful for organizing the athlete’s training program. Many authors have studied the relative influence of training (7, 22, 23, 27) and found that reactions to training depend on volume, intensity and frequency of the training sessions (7, 16, 23). Others have reported divergent results (4, 9), perhaps related to the fact that delayed effects and inter–individual differences were not taken into account. For individual swimmers, mathematical models have been developed to describe the dynamic aspect of training and the consequences of succession of training loads over time (2). The Banister model (2, 3, 4) and its modifications (5, 6) are based on two antagonistic functions, both calculated from the training impulse. Studies on cellular adaptability reactions to exercise (3) have demonstrated that the negative function can be assimilated to a fatiguing impulse. The positive function can be compared to a fitness impulse resulting from the organism’s adaptation to training. Expressed as an exponential, the functions account for the decreasing impact of the training effect. When iterative training sessions are considered, the time course of performance is described by: pt=p0+ka∑s=0t-1e-(t-s)/τaws-kf∑s=0t-1e-(t-s)/τfws where pt is the known performance at week (or day) t; ws is the known training load per week (or day) from the first week of training to the week (or day) preceding performance pt; ka, kf are the fitness and fatigue multiplying factors, respectively; τa, τf are the fitness and fatigue decay time constants, respectively; p0 corresponds to an initial basic level of performance. There is no clear consensus on just how many data points are needed per parameter to ensure a stable solution in a regression analysis. Proposals reported in the literature have ranged from 5 to 50. Stevens (26) recommends a nominal number of 15 observations per parameter (except the intercept parameter) for a multiple linear regression. But since the Banister model is a non-linear model, inference is based on asymptotic theory (10), which implies more data points per parameter than for a linear regression model. This means a large number of observations would be required to obtain precise results and enable pertinent statistical analysis. The Banister model also assumes the parameters remain constant over time, an assumption which is not consistent with observed time-dependent alterations in response to training (3, 4, 6, 22). When few repeated measurements are available for several subjects, mixed models provide an attractive solution (29). Instead of constructing a personal model for each subject, a model of popular behavior is constructed, allowing parameters to vary from one individual to another, to take into account the heterogeneity between subjects. Particular care in characterizing random variation in the data is required to recognize two levels of variability: random variation among measurements within a given individual (intra-individual variation) and random variation among individuals (inter-individual variation) (10). In addition, mixed models analyze responses corresponding to different dose inputs (10), a common situation in swimming since training loads differ with age, specialty, and/or competition level (7, 23). The aim of the present study was to investigate the effect of training on performance of 13 elite swimmers taking into account A) individual profiles; B) sub population profiles; and C) time effect over 3 seasons.

Journal ArticleDOI
TL;DR: A flexible mixed model approach to cosinor analysis is considered, where the fixed effect parameters may enter nonlinearly as acrophase and amplitude for each harmonic or linearly after transformation to regression coefficients, and when the two methods are equivalent and when they differ.
Abstract: The cosinor model, used for variables governed by circadian and other biological rhythms, is a nonlinear model in the amplitude and acrophase parameters that has a linear representation upon transformation. With linear cosinor analysis, amplitude and acrophase for each harmonic can be computed as nonlinear functions of the estimated linear regression coefficients. Here a flexible mixed model approach to cosinor analysis is considered, where the fixed effect parameters may enter nonlinearly as acrophase and amplitude for each harmonic or linearly after transformation to regression coefficients. In addition, the random effects may enter nonlinearly as subject-specific deviations from the acrophases and amplitudes or linearly as subject-specific deviations from the regression coefficients. It is also possible for the fixed effects to enter nonlinearly while the random effects enter linearly. Additionally, we evaluate whether including higher order linear harmonic terms as random effects, that is, Rao-Khatri 'covariance adjustment', improves precision. Applying the delta method to nonlinear functions of the parameters from linear mixed cosinor models to obtain approximate variances produces results that are often identical to results from nonlinear mixed models. Consequently, traditional linear cosinor analysis can often be used to estimate and compare the nonlinear parameters of interest, that is, amplitudes and acrophases, via the delta method. This is advantageous since the nonlinear mixed model may have convergence difficulties for more complex models. However, for some multiple-group analyses, the linear cosinor transformation should not be used and we clarify when the two methods are equivalent and when they differ.

Journal ArticleDOI
TL;DR: In this article, a multiple-trait reduced rank random regression test-day model was applied for the breeding value estimation for first parity milk, protein, and fat yield of Finnish dairy cattle.

Journal ArticleDOI
TL;DR: It is emphasized that statistical techniques and software to fit them are more widely available now, but that parameters have different interpretations in different model types, and the importance of focusing on choosing the most appropriate model for the specific purpose of the analysis is stressed.

Journal ArticleDOI
TL;DR: A frequentist's alternative to the recently developed hierarchical Bayes methods for small-area estimation using generalized linear mixed models, in which the asymptotic behavior of the relative savings loss demonstrates the superiority of the proposed EBP over the usual small area proportion.

01 Jan 2003
TL;DR: Zheng and Little as mentioned in this paper used penalized splines (p-splines) to model smoothly -varying relationships between the outcome and the inclusion probabilities in one-stage probability proportional to size (PPS) samples.
Abstract: Samplers often distrust model-based approaches to survey inference due to concerns about model misspecification when applied to large samples from complex populations. We suggest that the model-based paradigm can work very successfully in survey settings, provided models are chosen that take into account the sample design and avoid strong parametric assumptions. The Horvitz-Thompson (HT) estimator is a simple design-unbiased estimator of the finite population total in probability sampling designs. From a modeling perspective, the HT estimator performs well when the ratios of the outcome values and the inclusion probabilities are exchangeable. When this assumption is not met, the HT estimator can be very inefficient. In Zheng and Little (2002a, 2002b) we used penalized splines (p-splines) to model smoothly -varying relationships between the outcome and the inclusion probabilities in one-stage probability proportional to size (PPS) samples. We showed that p-spline model-based estimators are in general more efficient than the HT estimator, and can be used to provide narrower confidence intervals with close to nominal confidence coverage. In this article, we extend this approach to two-stage sampling designs. We use a p-spline based mixed model that fits a nonparametric relationship between the primary sampling unit (PSU) means and a measure of PSU size, and incorporates random effects to model clustering. For variance estimation we consider the empirical Bayes model-based variance, the jackknife and balanced repeated replication. Simulation studies on simulated data and on samples drawn from public use microdata in the 1990 census demonstrate gains for the model-based p-spline estimator over the HT estimator and linear model-assisted estimators. Simulations also show the variance estimation methods yield confidence intervals with satisfactory confidence coverage. Interestingly, these gains can be seen in an equal probability design, where the first stage selection is PPS and the second stage selection probabilities are proportional to the inverse of the first stage inclusion probabilities, and the HT estimator leads to the unweighted mean. In situations that most favor the HT estimator, the model-based estimators have comparable efficiency. Draft March 14, 2003 PENALIZED SPLINE NONPARAMETRIC MIXED MODELS FOR INFERENCE ABOUT A FINITE POPULATION MEAN FROM TWO-STAGE SAMPLES Hui Zheng and Roderick Little Department of Biostatistics University of Michigan Hosted by The Berkeley Electronic Press 1 Abstract Samplers often distrus t model-based approaches to survey inference due to concerns about model misspecification when applied to large samples from complex populations. We suggest that the model-based paradigm can work very successfully in survey settings, provided models are chosen that take into account the sample design and avoid strong parametric assumptions. The Horvitz-Thompson (HT) estimator is a simple design-unbiased estimator of the finite population total in probability sampling designs. From a modeling perspective, the HT estimator performs well when the ratios of the outcome values and the inclusion probabilities are exchangeable. When this assumption is not met, the HT estimator can be very inefficient. In Zheng and Little (2002a, 2002b) we used penalized splines (p-splines) to model smoothly –varying relationships between the outcome and the inclusion probabilities in one-stage probability proportional to size (PPS) samples. We showed that p-spline model-based estimators are in general more efficient than the HT estimator, and can be used to provide narrower confidence intervals with close to nominal confidence coverage. In this article, we extend this approach to two-stage sampling designs. We use a p-spline based mixed model that fits a nonparametric relationship between the primary sampling unit (PSU) means and a measure of PSU size, and incorporates random effects to model clustering. For variance estimation we consider the empirical Bayes modelbased variance, the jackknife and balanced repeated replication. Simulation studies on simulated data and on samples drawn from public use microdata in the 1990 census demonstrate gains for the model-based p-spline estimator over the HT estimator and linear model-assisted estimators. Simulations also show the variance estimation methods http://biostats.bepress.com/umichbiostat/paper8 2 yield confidence intervals with satisfactory confidence coverage. Interestingly, these gains can be seen in an equal probability design, where the first stage selection is PPS and the second stage selection probabilities are proportional to the inverse of the first stage inclusion probabilities, and the HT estimator leads to the unweighted mean. In situations that most favor the HT estimator, the model-based estimators have comparable efficiency.

Journal ArticleDOI
TL;DR: It is shown that identification of pertinent factors that influence hospital LOS can provide important information for health care planning and resource allocation and the development of the class of finite mixture GLMM.

Journal ArticleDOI
TL;DR: In this article, the authors used a mixed model in which between-haul variation in selection is treated as a random effect, and the model was combined over hauls to estimate a mean selection curve.
Abstract: Parametric size-selection curves are often combined over hauls to estimate a mean selection curve using a mixed model in which between-haul variation in selection is treated as a random effect. Thi...

Journal ArticleDOI
TL;DR: This paper investigates conditions for propriety of the proposed priors for the class of GLMMs and shows that they are proper under some very general conditions and examines recent computational methods for performing Gibbs sampling for this class of models.

Journal ArticleDOI
TL;DR: In this paper, the authors consider the problem of constructing confidence intervals for a linear regression model with unbalanced nested error structure and examine the ability of MIXED to produce confidence intervals that maintain the stated confidence coefficient.
Abstract: In this article we consider the problem of constructing confidence intervals for a linear regression model with unbalanced nested error structure. A popular approach is the likelihood-based method employed by PROC MIXED of SAS. In this article, we examine the ability of MIXED to produce confidence intervals that maintain the stated confidence coefficient. Our results suggest that intervals for the regression coefficients work well, but intervals for the variance component associated with the primary level cannot be recommended. Accordingly, we propose alternative methods for constructing confidence intervals on the primary level variance component. Computer simulation is used to compare the proposed methods. A numerical example and SAS code are provided to demonstrate the methods.

Journal ArticleDOI
TL;DR: In this article, a nonparametric approach is developed to estimate parameters in nonlinear mixed effects models and asymptotic properties of the non-parametric maximum likelihood estimators and associated computational algorithms are provided.
Abstract: SUMMARY A nonparametric approach is developed herein to estimate parameters in nonlinear mixed effects models. Asymptotic properties of the nonparametric maximum likelihood estimators and associated computational algorithms are provided. Empirical Bayes estimators of functionals of the random effects are also developed. Applications to population pharmacokinetics are given.

Journal ArticleDOI
TL;DR: In this article, a hybrid method that combines Laplace's approximation and Monte Carlo simulations to evaluate integrals in the likelihood function is proposed for estimation of the parameters in nonlinear mixed effects models that assume a normal parametric family for the random effects.
Abstract: SUMMARY A hybrid method that combines Laplace's approximation and Monte Carlo simulations to evaluate integrals in the likelihood function is proposed for estimation of the parameters in nonlinear mixed effects models that assume a normal parametric family for the random effects. Simulations show that these parametric estimates of fixed effects are close to the nonparametric estimates even though the mixing distribution is far from the assumed normal parametric family. An asymptotic theory of this hybrid method for parametric estimation without requiring the true mixing distribution to belong to the assumed parametric family is developed to explain these results. This hybrid method and its asymptotic theory are also extended to generalised linear mixed effects models.

Journal ArticleDOI
TL;DR: In this paper, the problem of estimating the fixed effects parameters and the variance components in logistic mixed models is considered, and a generalized estimating equation approach (GEE) is proposed.
Abstract: In this article, the problem of estimating the fixed effects parameters and the variance components in logistic mixed models is considered. It is well known that estimating these parameters by the method of maximum likelihood faces computational difficulties. As an alternative, we propose the Generalized Estimating Equations approach (denoted, GEE) defined by Liang and Zeger. The estimators obtained are consistent and asymptotically normal. We illustrate this method with simulations and with an analysis of real data in quality of life.

Journal ArticleDOI
TL;DR: The authors generalizes Druilhet's results on optimality of circular neighbor balanced block designs under the model with fixed neighbor effects and shows that some of these designs are also optimal under random neighbor effects.

Journal ArticleDOI
TL;DR: In this article, the authors compare the performance of a repeated measures analysis and a longitudinal mixed model for postharvest quality assessment of tomato cultivars, showing that the flexibility of such a mixed model, both towards the repeated measures design of the experiments as towards the large product variability inherent to these horticultural products, is an important advantage over classical techniques.
Abstract: In the field of postharvest quality assessment of horticultural products, research on the development of non-destructive quality sensors, replacing destructive and often time consuming sensors, has spurred in the last decennium offering the possibility of taking repeated quality measures on the same product. Repeated measures analysis is gaining importance during recent years and several software packages offer a broad class of routines. A dataset dealing with the postharvest quality evolution of different tomato cultivars serves as practical example for the comparison and discussion of four different statistical model types. Starting from an analysis at each time point and an ordinary least squares regression model as standard and widely used methods, this contribution aims at comparing these two methods to a repeated measures analysis and a longitudinal mixed model. It is shown that the flexibility of such a mixed model, both towards the repeated measures design of the experiments as towards the large product variability inherent to these horticultural products, is an important advantage over classical techniques. This research shows that different conclusions could be drawn depending on which technique is used due to the basic assumptions of each model and which are not always fulfilled. The results further demonstrate the flexibility of the mixed model concept. Using a mixed model for repeated measures, the different sources of variability, being inter-tomato variability, intra-tomato variability and measurement error were characterized being of great benefit to the researcher.

Journal ArticleDOI
TL;DR: In this article, a generalized quasilikelihood approach is proposed to analyze the repeated familial data based on the familial structure caused by gamma random effects, which provides estimates of the regression parameters and the variance component of the random effects after taking the longitudinal correlations of the data into account.

Journal ArticleDOI
TL;DR: The simulation study indicated that there is a consistent advantage in using the Bayesian method to detect a “correct” model under certain specifications of additive genetics and common environmental effects, and both methods had difficulty in detecting the correct model when the additive genetic effect was low or of moderate range.
Abstract: We compare Bayesian methodology utilizing free-ware BUGS (Bayesian Inference Using Gibbs Sampling) with the traditional structural equation modelling approach based on another free-ware package, Mx. Dichotomous and ordinal (three category) twin data were simulated according to different additive genetic and common environment models for phenotypic variation. Practical issues are discussed in using Gibbs sampling as implemented by BUGS to fit subject-specific Bayesian generalized linear models, where the components of variation may be estimated directly. The simulation study (based on 2000 twin pairs) indicated that there is a consistent advantage in using the Bayesian method to detect a “correct” model under certain specifications of additive genetics and common environmental effects. For binary data, both methods had difficulty in detecting the correct model when the additive genetic effect was low (between 10 and 20%) or of moderate range (between 20 and 40%). Furthermore, neither method could adequately detect a correct model that included a modest common environmental effect (20%) even when the additive genetic effect was large (50%). Power was significantly improved with ordinal data for most scenarios, except for the case of low heritability under a true ACE model. We illustrate and compare both methods using data from 1239 twin pairs over the age of 50 years, who were registered with the Australian National Health and Medical Research Council Twin Registry (ATR) and presented symptoms associated with osteoarthritis occurring in joints of the hand.

Journal ArticleDOI
TL;DR: In this article, the authors compare ANOVA and nonparametric methods based on ranks to the threshold model, and employ a flexible extension of a threshold model with a non-normal underlying latent variable to account for nonnormality.

Journal ArticleDOI
TL;DR: In this article, the authors extend Cook's local in-depth analysis to the penalized Gaussian likelihood estimator that uses a smoothing spline as a solution to its non-parametric component.
Abstract: . Partially linear models are extensions of linear models to include a non-parametricfunction of some covariate. They have been found to be useful in both cross-sectional and longi-tudinal studies. This paper provides a convenient means to extend Cook’s local influence analysis tothe penalized Gaussian likelihood estimator that uses a smoothing spline as a solution to its non-parametric component. Insight is also provided into the interplay of the influence or leveragemeasures between the linear and the non-parametric components in the model. The diagnostics areapplied to a mouthwash data set and a longitudinal hormone study with informative results.Key words: diagnostics, local influence, longitudinal data, mixed model, partially linear,penalizedlikelihood,regression,smoothingspline 1. IntroductionAregressionmodelisoftenusedasaconvenientdeviceforunderstandingthemainfeaturesofdata, but it is not meant to be exact in characterizing the relationship between a responsevariabley andothervariablesx.Inanyseriousdataanalysis,onewouldaskhowmuchtheanalysiscanbeinfluencedbyminorperturbationsofthemodel.Casedeletion,asproposedbyCook(1977),isawell-knowntoolforassessingtheimpactofeach observation on the parameter estimate. This approach for diagnostics has been wellaccepted in the statistics community, and Cook’s distance has a clear interpretation andimplication.However,casedeletiondoesnotdirectlyreflecttheimpactofotherperturbationsof the model. To supplement the case deletion approach, Cook (1986) developed a localinfluence approach based on the sensitivity of log-likelihood against small perturbations inpartofthemodel.Thelocalinfluenceanalysisdoesnotinvolverecomputingtheparameterestimates for every case deletion, so it is often computationally simpler. Furthermore, itpermits perturbation of various aspects of the model to tell us more than what the casedeletionapproachisdesignedfor.Forexample,itcanhelpmeasureleverageofadesignpointandevaluatetheimpactofasmallmeasurementerrorofx onourestimates.Thisapproachhas been extended to generalized linear models by Thomas & Cook (1989), to restrictedlikelihoodmodelsbyKwan&Fung(1998),andtonon-linearmodelsbySt.Laurent&Cook(1993).Lesaffre&Verbeke(1998)madeathoroughinvestigationoflocalinfluenceanalysisinlinear mixed models. Ouwens et al. (2001) extended the work to generalized linear mixedmodels.Thomas(1991)constructedlocalinfluencediagnosticsforthesmoothingparameterinsmoothing spline.Lu et al.(1997) studied astandardizedinfluencematrixthatis relatedto

Journal ArticleDOI
TL;DR: In this article, the authors extend existing calibration techniques to nonlinear mixed effects models, and illustrate the procedure using data from a time-of-weed-removal study, which is designed to estimate the critical period of weed control.
Abstract: Many disciplines conduct studies in which the primary objectives depend on inference based on a nonlinear relationship between the treatment and response. In particular, interest often focuses on calibration—that is, estimating the best treatment level to achieve a particular result. Often, data for such calibration come from experiments with split-plots or other features that result in multiple error terms or other nontrivial error structures. One such example is the time-of-weed-removal study in weed science, designed to estimate the critical period of weed control. Calibration, or inverse prediction, is not a trivial problem with simple linear regression, and the complexities of experiments such as the time-of-weed-removal study further complicate the procedure. In this article, we extend existing calibration techniques to nonlinear mixed effects models, and illustrate the procedure using data from a time-of-weed-removal study.