scispace - formally typeset
Search or ask a question

Showing papers on "Random effects model published in 1999"


Book
01 Jan 1999
TL;DR: In this paper, the authors proposed a multilevel regression model to estimate within-and between-group correlations using a combination of within-group correlation and cross-group evidence.
Abstract: Preface second edition Preface to first edition Introduction Multilevel analysis Probability models This book Prerequisites Notation Multilevel Theories, Multi-Stage Sampling and Multilevel Models Dependence as a nuisance Dependence as an interesting phenomenon Macro-level, micro-level, and cross-level relations Glommary Statistical Treatment of Clustered Data Aggregation Disaggregation The intraclass correlation Within-group and between group variance Testing for group differences Design effects in two-stage samples Reliability of aggregated variables Within-and between group relations Regressions Correlations Estimation of within-and between-group correlations Combination of within-group evidence Glommary The Random Intercept Model Terminology and notation A regression model: fixed effects only Variable intercepts: fixed or random parameters? When to use random coefficient models Definition of the random intercept model More explanatory variables Within-and between-group regressions Parameter estimation 'Estimating' random group effects: posterior means Posterior confidence intervals Three-level random intercept models Glommary The Hierarchical Linear Model Random slopes Heteroscedasticity Do not force ?01 to be 0! Interpretation of random slope variances Explanation of random intercepts and slopes Cross-level interaction effects A general formulation of fixed and random parts Specification of random slope models Centering variables with random slopes? Estimation Three or more levels Glommary Testing and Model Specification Tests for fixed parameters Multiparameter tests for fixed effects Deviance tests More powerful tests for variance parameters Other tests for parameters in the random part Confidence intervals for parameters in the random part Model specification Working upward from level one Joint consideration of level-one and level-two variables Concluding remarks on model specification Glommary How Much Does the Model Explain? Explained variance Negative values of R2? Definition of the proportion of explained variance in two-level models Explained variance in three-level models Explained variance in models with random slopes Components of variance Random intercept models Random slope models Glommary Heteroscedasticity Heteroscedasticity at level one Linear variance functions Quadratic variance functions Heteroscedasticity at level two Glommary Missing Data General issues for missing data Implications for design Missing values of the dependent variable Full maximum likelihood Imputation The imputation method Putting together the multiple results Multiple imputations by chained equations Choice of the imputation model Glommary Assumptions of the Hierarchical Linear Model Assumptions of the hierarchical linear model Following the logic of the hierarchical linear model Include contextual effects Check whether variables have random effects Explained variance Specification of the fixed part Specification of the random part Testing for heteroscedasticity What to do in case of heteroscedasticity Inspection of level-one residuals Residuals at level two Influence of level-two units More general distributional assumptions Glommary Designing Multilevel Studies Some introductory notes on power Estimating a population mean Measurement of subjects Estimating association between variables Cross-level interaction effects Allocating treatment to groups or individuals Exploring the variance structure The intraclass correlation Variance parameters Glommary Other Methods and Models Bayesian inference Sandwich estimators for standard errors Latent class models Glommary Imperfect Hierarchies A two-level model with a crossed random factor Crossed random effects in three-level models Multiple membership models Multiple membership multiple classification models Glommary Survey Weights Model-based and design-based inference Descriptive and analytic use of surveys Two kinds of weights Choosing between model-based and design-based analysis Inclusion probabilities and two-level weights Exploring the informativeness of the sampling design Example: Metacognitive strategies as measured in the PISA study Sampling design Model-based analysis of data divided into parts Inclusion of weights in the model How to assign weights in multilevel models Appendix. Matrix expressions for the single-level estimators Glommary Longitudinal Data Fixed occasions The compound symmetry models Random slopes The fully multivariate model Multivariate regression analysis Explained variance Variable occasion designs Populations of curves Random functions Explaining the functions 27415.2.4 Changing covariates Autocorrelated residuals Glommary Multivariate Multilevel Models Why analyze multiple dependent variables simultaneously? The multivariate random intercept model Multivariate random slope models Glommary Discrete Dependent Variables Hierarchical generalized linear models Introduction to multilevel logistic regression Heterogeneous proportions The logit function: Log-odds The empty model The random intercept model Estimation Aggregation Further topics on multilevel logistic regression Random slope model Representation as a threshold model Residual intraclass correlation coefficient Explained variance Consequences of adding effects to the model Ordered categorical variables Multilevel event history analysis Multilevel Poisson regression Glommary Software Special software for multilevel modeling HLM MLwiN The MIXOR suite and SuperMix Modules in general-purpose software packages SAS procedures VARCOMP, MIXED, GLIMMIX, and NLMIXED R Stata SPSS, commands VARCOMP and MIXED Other multilevel software PinT Optimal Design MLPowSim Mplus Latent Gold REALCOM WinBUGS References Index

9,578 citations


Journal ArticleDOI
TL;DR: In this paper, the authors compared a number of methods which can be used to investigate whether a particular covariate, with a value defined for each study in the meta-analysis, explains any heterogeneity.
Abstract: Exploring the possible reasons for heterogeneity between studies is an important aspect of conducting a meta-analysis. This paper compares a number of methods which can be used to investigate whether a particular covariate, with a value defined for each study in the meta-analysis, explains any heterogeneity. The main example is from a meta-analysis of randomized trials of serum cholesterol reduction, in which the log-odds ratio for coronary events is related to the average extent of cholesterol reduction achieved in each trial. Different forms of weighted normal errors regression and random effects logistic regression are compared. These analyses quantify the extent to which heterogeneity is explained, as well as the effect of cholesterol reduction on the risk of coronary events. In a second example, the relationship between treatment effect estimates and their precision is examined, in order to assess the evidence for publication bias. We conclude that methods which allow for an additive component of residual heterogeneity should be used. In weighted regression, a restricted maximum likelihood estimator is appropriate, although a number of other estimators are also available. Methods which use the original form of the data explicitly, for example the binomial model for observed proportions rather than assuming normality of the log-odds ratios, are now computationally feasible. Although such methods are preferable in principle, they often give similar results in practice.

1,527 citations


Journal ArticleDOI
TL;DR: In this paper, the cubic smoothing spline is used in conjunction with fixed and random effects, random coefficients and variance modelling to provide simultaneous modelling of trends and covariance structure, which allows coherent and flexible empirical model building in complex situations.
Abstract: In designed experiments and in particular longitudinal studies, the aim may be to assess the effect of a quantitative variable such as time on treatment effects. Modelling treatment effects can be complex in the presence of other sources of variation. Three examples are presented to illustrate an approach to analysis in such cases. The first example is a longitudinal experiment on the growth of cows under a factorial treatment structure where serial correlation and variance heterogeneity complicate the analysis. The second example involves the calibration of optical density and the concentration of a protein DNase in the presence of sampling variation and variance heterogeneity. The final example is a multienvironment agricultural field experiment in which a yield-seeding rate relationship is required for several varieties of lupins. Spatial variation within environments, heterogeneity between environments and variation between varieties all need to be incorporated in the analysis. In this paper, the cubic smoothing spline is used in conjunction with fixed and random effects, random coefficients and variance modelling to provide simultaneous modelling of trends and covariance structure. The key result that allows coherent and flexible empirical model building in complex situations is the linear mixed model representation of the cubic smoothing spline. An extension is proposed in which trend is partitioned into smooth and nonsmooth components. Estimation and inference, the analysis of the three examples and a discussion of extensions and unresolved issues are also presented.

594 citations


Journal ArticleDOI
TL;DR: The results show that the asymptotic DerSimonian and Laird Q statistic and the bootstrap versions of the other tests give the correct type I error under the null hypothesis but that all of the tests considered have low statistical power, especially when the number of studies included in the meta-analysis is small.
Abstract: The identification of heterogeneity in effects between studies is a key issue in meta-analyses of observational studies, since it is critical for determining whether it is appropriate to pool the individual results into one summary measure. The result of a hypothesis test is often used as the decision criterion. In this paper, the authors use a large simulation study patterned from the key features of five published epidemiologic meta-analyses to investigate the type I error and statistical power of five previously proposed asymptotic homogeneity tests, a parametric bootstrap version of each of the tests, and tau2-bootstrap, a test proposed by the authors. The results show that the asymptotic DerSimonian and Laird Q statistic and the bootstrap versions of the other tests give the correct type I error under the null hypothesis but that all of the tests considered have low statistical power, especially when the number of studies included in the meta-analysis is small (<20). From the point of view of validity, power, and computational ease, the Q statistic is clearly the best choice. The authors found that the performance of all of the tests considered did not depend appreciably upon the value of the pooled odds ratio, both for size and for power. Because tests for heterogeneity will often be underpowered, random effects models can be used routinely, and heterogeneity can be quantified by means of R(I), the proportion of the total variance of the pooled effect measure due to between-study variance, and CV(B), the between-study coefficient of variation.

351 citations


Journal Article
TL;DR: In this paper, the cubic smoothing spline is used in conjunction with fixed and random effects, random coefficients and variance modelling to provide simultaneous modelling of trends and covariance structure, which allows coherent and flexible empirical model building in complex situations.
Abstract: In designed experiments and in particular longitudinal studies, the aim may be to assess the effect of a quantitative variable such as time on treatment effects. Modelling treatment effects can be complex in the presence of other sources of variation. Three examples are presented to illustrate an approach to analysis in such cases. The first example is a longitudinal experiment on the growth of cows under a factorial treatment structure where serial correlation and variance heterogeneity complicate the analysis. The second example involves the calibration of optical density and the concentration of a protein DNase in the presence of sampling variation and variance heterogeneity. The final example is a multienvironment agricultural field experiment in which a yield-seeding rate relationship is required for several varieties of lupins. Spatial variation within environments, heterogeneity between environments and variation between varieties all need to be incorporated in the analysis. In this paper, the cubic smoothing spline is used in conjunction with fixed and random effects, random coefficients and variance modelling to provide simultaneous modelling of trends and covariance structure. The key result that allows coherent and flexible empirical model building in complex situations is the linear mixed model representation of the cubic smoothing spline. An extension is proposed in which trend is partitioned into smooth and non-smooth components. Estimation and inference, the analysis of the three examples and a discussion of extensions and unresolved issues are also presented.

273 citations


Journal ArticleDOI
TL;DR: This tutorial provides an introduction to the hierarchical linear models technique in general terms, and then specifies model notation and assumptions in detail, and elaborate on model interpretation and provide guidelines for model checking.
Abstract: Hierarchical linear models are useful for understanding relationships in hierarchical data structures, such as patients within hospitals or physicians within hospitals. In this tutorial we provide an introduction to the technique in general terms, and then specify model notation and assumptions in detail. We describe estimation techniques and hypothesis testing procedures for the three types of parameters involved in hierarchical linear models: fixed effects, covariance components, and random effects. We illustrate the application using an example from the Type II Diabetes Patient Outcomes Research Team (PORT) study and use two popular PC-based statistical computing packages, HLM/2L and SAS Proc Mixed, to perform two-level hierarchical analysis. We compare output from the two packages applied to our example data as well as to simulated data. We elaborate on model interpretation and provide guidelines for model checking.

244 citations


Journal ArticleDOI
TL;DR: In this manuscript, an alternative parameterization of the logistic-normal random effects model is adopted, and both likelihood and estimating equation approaches to parameter estimation are studied.
Abstract: Summary. Likelihood-based inference for longitudinal binary data can be obtained using a generalized linear mixed model (Breslow, N. and Clayton, D. G., 1993, Journal of the American Statistical Association88, 9–25; Wolfinger, R. and O'Connell, M., 1993, Journal of Statistical Computation and Simulation48, 233–243), given the recent improvements in computational approaches. Alternatively, Fitzmaurice and Laird (1993, Biometrika80, 141–151), Molenberghs and Lesaffre (1994, Journal of the American Statistical Association89, 633–644), and Heagerty and Zeger (1996, Journal of the American Statistical Association91, 1024–1036) have developed a likelihood-based inference that adopts a marginal mean regression parameter and completes full specification of the joint multivariate distribution through either canonical and/or marginal higher moment assumptions. Each of these marginal approaches is computationally intense and currently limited to small cluster sizes. In this manuscript, an alternative parameterization of the logistic-normal random effects model is adopted, and both likelihood and estimating equation approaches to parameter estimation are studied. A key feature of the proposed approach is that marginal regression parameters are adopted that still permit individual-level predictions or contrasts. An example is presented where scientific interest is in both the mean response and the covariance among repeated measurements.

219 citations


Journal ArticleDOI
TL;DR: A Monte Carlo study of the problem of testing for a centre effect in multi-centre studies following a proportional hazards regression analysis shows that for moderate samples the fixed effects tests have nominal levels much higher than specified, but the random effect test performs as expected under the null hypothesis.
Abstract: The problem of testing for a centre effect in multi-centre studies following a proportional hazards regression analysis is considered. Two approaches to the problem can be used. One fits a proportional hazards model with a fixed covariate included for each centre (except one). The need for a centre specific adjustment is evaluated using either a score, Wald or likelihood ratio test of the hypothesis that all the centre specific effects are equal to zero. An alternative approach is to introduce a random effect or frailty for each centre into the model. Recently, Commenges and Andersen have proposed a score test for this random effects model. By a Monte Carlo study we compare the performance of these two approaches when either the fixed or random effects model holds true. The study shows that for moderate samples the fixed effects tests have nominal levels much higher than specified, but the random effect test performs as expected under the null hypothesis. Under the alternative hypothesis the random effect test has good power to detect relatively small fixed or random centre effects. Also, if the centre effect is ignored the estimator of the main treatment effect may be quite biased and is inconsistent. The tests are illustrated on a retrospective multi-centre study of recovery from bone marrow transplantation.

210 citations


Journal ArticleDOI
TL;DR: In this paper, the authors consider a Bayesian hierarchical linear mixed model where the fixed effects have a vague prior such as a constant prior and the random effect follows a class of CAR(1) models including those whose joint prior distribution of the regional effects is improper.
Abstract: SUMMARY We examine properties of the conditional autoregressive model, or CAR( 1) model, which is commonly used to represent regional effects in Bayesian analyses of mortality rates. We consider a Bayesian hierarchical linear mixed model where the fixed effects have a vague prior such as a constant prior and the random effect follows a class of CAR(1) models including those whose joint prior distribution of the regional effects is improper. We give sufficient conditions for the existence of the posterior distribution of the fixed and random effects and variance components. We then prove the necessity of the conditions and give a one-way analysis of variance example where the posterior may or may not exist. Finally, we extend the result to the generalised linear mixed model, which includes as a special case the Poisson log-linear model commonly used in disease mapping.

168 citations


Journal ArticleDOI
TL;DR: In this paper, alternative test statistics are presented and better approximating test distributions are derived, and the methods in the unbalanced heteroscedastic 1-way random ANOVA model and for the probability difference method are discussed.
Abstract: In many fields of applications, test statistics are obtained by combining estimates from several experiments, studies or centres of a multi-centre trial. The commonly used test procedure to judge the evidence of a common overall effect can result in a considerable overestimation of the significance level, leading to a high rate of too liberal decisions. Alternative test statistics are presented and better approximating test distributions are derived. Explicitly discussed are the methods in the unbalanced heteroscedastic 1-way random ANOVA model and for the probability difference method, including interaction treatment by centres or studies. Numerical results are presented by simulation studies.

152 citations


Journal ArticleDOI
TL;DR: Permutation and ad hoc methods for testing with the random effects model, which theoretically controls the type I error rate for typical meta-analyses scenarios, are proposed.
Abstract: The standard approach to inference for random effects meta-analysis relies on approximating the null distribution of a test statistic by a standard normal distribution. This approximation is asymptotic on k, the number of studies, and can be substantially in error in medical meta-analyses, which often have only a few studies. This paper proposes permutation and ad hoc methods for testing with the random effects model. Under the group permutation method, we randomly switch the treatment and control group labels in each trial. This idea is similar to using a permutation distribution for a community intervention trial where communities are randomized in pairs. The permutation method theoretically controls the type I error rate for typical meta-analyses scenarios. We also suggest two ad hoc procedures. Our first suggestion is to use a t-reference distribution with k-1 degrees of freedom rather than a standard normal distribution for the usual random effects test statistic. We also investigate the use of a simple t-statistic on the reported treatment effects.

Journal ArticleDOI
TL;DR: In this paper, the authors point out to applied researchers what adjustments are needed to the coefficient estimates in a random effects probit model in order to make valid comparisons in terms of coefficient estimates and marginal effects across different specifications.
Abstract: This note points out to applied researchers what adjustments are needed to the coefficient estimates in a random effects probit model in order to make valid comparisons in terms of coefficient estimates and marginal effects across different specifications. These adjustments are necessary because of the normalisation that is used by standard software in order to facilitate easy estimation of the random effects probit model.

Journal ArticleDOI
TL;DR: In this paper, the authors present a survey of the literature on experimental design for linear and non-linear data sets and compare the results of four experimental designs: the basic factorial design, the complete block design (CB), the split plot/repeated measures design (SP/RM), the Latin square design (LS), and the two-way BF design (BF).
Abstract: To the Instructor. Sample Exam Questions. To the Student. Acknowledgments. 1. Introduction to Experimental Design. 1. The Challenge of planning a good experiment. 2. Three basic principles and four experimental designs. 3. The factor structure of the four experimental designs. 2. Informal Analysis and Checking Assumptions. 1. What analysis of variance does. 2. The six fisher assumptions. 3. Informal analysis, part 1: parallel dot graphs and choosing a scale. 4. Informal analysis, part 2: interaction graph for the log concentrations. 3. Formal Anova: Decomposing the Data and Measuring Variability, Testing Hypothesis and Estimating True Differences. 1. Decomposing the data. 2. Computing mean squares to measure average variability. 3. Standard deviation = root mean square for residuals. 4. Formal hypothesis testing: are the effects detectable? 5. Confidence intervals: the likely size of true differences. 4. Decisions About the Content of an Experiment. 1. The response. 2. Conditions. 3. Material. 5. Randomization and the Basic Factorial Design. 1. The basic factorial design ("What you do"). 2. Informal analysis. 3. Factor structure ("What you get"). 4. Decomposition and analysis of variance for one-way BF designs. 5. Using a computer [Optional]. 6. Algebraic notation for factor structure [Optional]. 6. Interaction and the Principle of Factorial Crossing. 1. Factorial crossing and the two-way basic factorial design, or BF[2]. 2. Interaction and the interaction graph. 3. Decomposition and ANOVA for the two-way design. 4. Using a computer [Optional]. 5. Algebraic notation for the two-way BF design [Optional]. 7. The Principle of Blocking. 1. Blocking and the complete block design (CB). 2. Two nuisance factors: the Latin square design(LS). 3. The split plot/repeated measures design (SP/RM). 4. Decomposition and analysis of variance. 5. Scatterplots for data sets with blocks. 6. Using a computer. [Optional]. 7. Algebraic notation for the CB, LS And SP/RM Designs. 8. Working with the Four Basic Designs. 1. Comparing and recognizing design structures. 2. Choosing a design structure: deciding about blocking. 3. Informal analysis: examples. 4. Recognizing alternative to ANOVA. 9. Extending the Basic Designs by Factorial Crossing. 1. Extending the BF design: general principles. 2. Three or more crossed factors of interest. 3. Compound within-blocks factors. 4.Graphical methods for 3-factor interactions. 5. Analysis of variance. 10. Decomposing a Data Set. 1. The basic decomposition step and the BF[1] design. 2. Decomposing data from balanced designs. 11. Comparisons, Contrasts, and Confidence Intervals. 1. Comparisons: confidence intervals and tests. 2. Adjustments for multiple comparisons. 3. Between-blocks factors and compound within-blocks factors. 4. Linear estimators and orthogonal contrasts [Optional]. 12. The Fisher Assumptions and How to Check Them. 1. Same SDs (s). 2. Independent chance errors (I). 3. The normality assumption (N). 4. Effects are additive (A) and constant (C). 5. Estimating replacement values for outliers. 13. Other Experimental Designs and Models. 1. New factor structures built by crossing and nesting. 2. New uses for old factor structures: fixed versus random effects. 3. Models with mixed interaction effects. 4. Expected mean square and f-ratios. 14. Continuous Carriers: A Visual Approach to Regression, Correlation and Analysis of Covariance. 1. Regression. 2. Balloon summaries and correlation. 3. Analysis of covariance. 15. Sampling Distributions and the Role of the Assumptions. 1. The logic of hypothesis testing. 2. Ways to think about sampling distributions. 3. Four fundamental families of distributions. 4. Sampling distributions for linear estimators. 5. Approximate sampling distributions for f-ratios. 6. Why (and when) are the models reasonable? Tables. Data Sources. Subject Index. Examples.

Journal ArticleDOI
TL;DR: RMAOV, marginal models and multilevel models generally provided similar estimates and standard errors for the treatment effects, although in one example with a relatively complex variance structure the marginal model produced less efficient estimates.
Abstract: A variety of methods are available for analysing repeated measurements data where the outcome is continuous. However, there is little information on how established methods, such as summary statistics and repeated measures analysis of variance (RMAOV), compare in practice with methods that have become available to applied statisticians more recently, such as marginal models (based on generalized estimating equation methodology) and multilevel models (that is, hierarchical random effects models). The aim of this paper is to exemplify the use of these methods, and directly compare their results by application to a clinical trial data set. The focus is on practical aspects rather than technical issues. The data considered were taken from a clinical trial of treatments for asthma in 240 children, in which a baseline and four post-randomization measurements of outcomes were taken. The simplicity of the method of summary statistics using the post-randomization mean of observations provided a useful initial analysis. However, fixed time effects or treatment-time interactions cannot be included in such an analysis, and choice of appropriate weighting when there is substantial missing data is problematic. RMAOV, marginal models and multilevel models generally provided similar estimates and standard errors for the treatment effects, although in one example with a relatively complex variance structure the marginal model produced less efficient estimates. Two advantages of multilevel models are that they provide direct estimates of variance components which are often of interest in their own right, and that they can be naturally extended to handle multivariate outcomes.

Journal ArticleDOI
TL;DR: In this article, the authors compare the results from a random effects probit model with a semiparametric probit and a fixed effects logit model that makes no assumptions about the distribution of unobserved heterogeneity.
Abstract: An important theoretical problem for criminologists is an explanation for the robust positive correlation between prior and future criminal offending. Nagin and Paternoster (1991) have suggested that the correlation could be due to timestable population differences in the underlying proneness to commit crimes (population heterogeneity) andyor the criminogenic effect that crime has on social bonds, conventional attachments, and the like (state dependence). Because of data and measurement limitations, the disentangling of population heterogeneity and state dependence requires that researchers control for unmeasured persistent heterogeneity. Frequently, random effects probit models have been employed, which, while user-friendly, make a strong parametric assumption that the unobserved heterogeneity in the population follows a normal distribution. Although semiparametric alternatives to the random effects probit model have recently appeared in the literature to avoid this problem, in this paper we return to reconsider the fully parametric model. Via simulation evidence, we first show that the random effects probit model produces biased estimates as the departure of heterogeneity from normality becomes more substantial. Using the 1958 Philadelphia cohort data, we then compare the results from a random effects probit model with a semiparametric probit model and a fixed effects logit model that makes no assumptions about the distribution of unobserved heterogeneity. We found that with this data set all three models converged on the same substantive result—even after controlling for unobserved persistent heterogeneity, with models that treat the unobserved heterogeneity very differently, prior conduct had a pronounced effect on subsequent offending. These results are inconsistent with a model that attributes all of the positive correlation between prior and future offending to differences in criminal propensity. Since researchers will often be completely blind with respect to the tenability of the normality assumption, we conclude that different estimation strategies should be brought to bear on the data.

Journal ArticleDOI
TL;DR: Predictive accuracy of truncated multiplicative models, shrinkage estimators of multiplicative model, and Best Linear Unbiased Predictors (BLUP) of the cell means based on a two-way random effects model with interaction were evaluated.
Abstract: Multiplicative statistical models such as the additive main effects and multiplicative interaction model (AMMI), the genotypes regression model (GREG), the sites regression model (SREG), the completely multiplicative model (COMM), and the shifted multiplicative model (SHMM) are useful for studying patterns of yield response across sites and estimating realized cultivar responses in specific environments Traditionally the series of multiplicative terms is truncated at some point beyond which further terms are believed to have little statistical significance or predictive value. Shrinkage estimators have been advocated as a model fitting method superior to model truncation. In this study, by data splitting and cross validation, we evaluated the predictive accuracy of (i) truncated multiplicative models, (ii) shrinkage estimators of multiplicative models, (iii) Best Linear Unbiased Predictors (BLUP) of the cell means based on a two-way random effects model with interaction, and (iv) empirical cell means in one wheat [durum (Triticum turgidum L. var. durum) and bread (Triticum aestivum L.)] and four maize (Zea mays L.) cultivar trials, with and without adjustment for replicate differences within environments. Shrinkage estimates of multiplicative models were at least as good as the better choice of truncated models fitted by least squares or BLUPs. Shrinkage estimation yields potentially better estimates of cultivar performance than do truncated multiplicative models and eliminates the need for cross validation or tests of hypotheses as criteria for determining the number of multiplicative terms to be retained. If random cross validation is used to choose a truncated model, data should be adjusted for replicate differences within environments.

Journal ArticleDOI
TL;DR: In this paper, the authors examined two strategies for meta-analysis of a series of 2 x 2 tables with the odds ratio modelled as a linear combination of study level covariates and random effects representing between-study variation.
Abstract: We examine two strategies for meta-analysis of a series of 2 x 2 tables with the odds ratio modelled as a linear combination of study level covariates and random effects representing between-study variation. Penalized quasi-likelihood (PQL), an approximate inference technique for generalized linear mixed models, and a linear model fitted by weighted least squares to the observed log-odds ratios are used to estimate regression coefficients and dispersion parameters. Simulation results demonstrate that both methods perform adequate approximate inference under many conditions, but that neither method works well in the presence of highly sparse data. Under certain conditions with small cell frequencies the PQL method provides better inference.

Journal ArticleDOI
TL;DR: This paper considers the problem of estimation of the rate of change of a disease marker in longitudinal studies, in which some subjects drop out prematurely (informatively) due to attrition, while others experience a non-informative drop-out process (end of study, withdrawal).
Abstract: Many cohort studies and clinical trials have designs which involve repeated measurements of disease markers. One problem in such longitudinal studies, when the primary interest is to estimate and to compare the evolution of a disease marker, is that planned data are not collected because of missing data due to missing visits and/or withdrawal or attrition (for example, death). Several methods to analyse such data are available, provided that the data are missing at random. However, serious biases can occur when missingness is informative. In such cases, one needs to apply methods that simultaneously model the observed data and the missingness process. In this paper we consider the problem of estimation of the rate of change of a disease marker in longitudinal studies, in which some subjects drop out prematurely (informatively) due to attrition, while others experience a non-informative drop-out process (end of study, withdrawal). We propose a method which combines a linear random effects model for the underlying pattern of the marker with a log-normal survival model for the informative drop-out process. Joint estimates are obtained through the restricted iterative generalized least squares method which are equivalent to restricted maximum likelihood estimates. A nested EM algorithm is applied to deal with censored survival data. The advantages of this method are: it provides a unified approach to estimate all the model parameters; it can effectively deal with irregular data (that is, measured at irregular time points), a complicated covariance structure and a complex underlying profile of the response variable; it does not entail such complex computation as would be required to maximize the joint likelihood. The method is illustrated by modelling CD4 count data in a clinical trial in patients with advanced HIV infection while its performance is tested by simulation studies.

Journal ArticleDOI
TL;DR: In this paper, a hierarchical Bayes generalized linear model approach is taken which connects the local areas, thereby enabling one to "borrow strength" to estimate cancer incidence rates for local areas.

Journal ArticleDOI
TL;DR: A semiparametric mixed effects regression model is proposed for the analysis of clustered or longitudinal data with continuous, ordinal, or binary outcome and Monte Carlo results are presented to show that the method can improve the mean squared error of the fixed effects estimators when the random effects distribution is not Gaussian.
Abstract: A semiparametric mixed effects regression model is proposed for the analysis of clustered or longitudinal data with continuous, ordinal, or binary outcome. The common assumption of Gaussian random effects is relaxed by using a predictive recursion method (Newton and Zhang, 1999) to provide a nonparametric smooth density estimate. A new strategy is introduced to accelerate the algorithm. Parameter estimates are obtained by maximizing the marginal profile likelihood by Powell's conjugate direction search method. Monte Carlo results are presented to show that the method can improve the mean squared error of the fixed effects estimators when the random effects distribution is not Gaussian. The usefulness of visualizing the random effects density itself is illustrated in the analysis of data from the Wisconsin Sleep Survey. The proposed estimation procedure is computationally feasible for quite large data sets.

Journal ArticleDOI
TL;DR: Estimates of genetic parameters resulting from various analytical models for birth weight, 205-d weight, and YWT were compared and rankings on predictions of breeding values were the same regardless of whether inbreeding coefficients for animal and dam were included in the models.
Abstract: Estimates of genetic parameters resulting from various analytical models for birth weight (BWT, n = 4,155), 205-d weight (WWT, n = 3,884), and 365-d weight (YWT, n = 3,476) were compared. Data consisted of records for Line 1 Hereford cattle selected for postweaning growth from 1934 to 1989 at ARS-USDA, Miles City, MT. Twelve models were compared. Model 1 included fixed effects of year, sex, age of dam; covariates for birth day and inbreeding coefficients of animal and of dam; and random animal genetic and residual effects. Model 2 was the same as Model 1 but ignored inbreeding coefficients. Model 3 was the same as Model 1 and included random maternal genetic effects with covariance between direct and maternal genetic effects, and maternal permanent environmental effects. Model 4 was the same as Model 3 but ignored inbreeding. Model 5 was the same as Model 1 but with a random sire effect instead of animal genetic effect. Model 6 was the same as Model 5 but ignored inbreeding. Model 7 was a sire model that considered relationships among males. Model 8 was a sire model, assuming sires to be unrelated, but with dam effects as uncorrelated random effects to account for maternal effects. Model 9 was a sire and dam model but with relationships to account for direct and maternal genetic effects; dams also were included as uncorrelated random effects to account for maternal permanent environmental effects. Model 10 was a sire model with maternal grandsire and dam effects all as uncorrelated random effects. Model 11 was a sire and maternal grandsire model, with dams as uncorrelated random effects but with sires and maternal grandsires assumed to be related using male relationships. Model 12 was the same as Model 11 but with all pedigree relationships from the full animal model for sires and maternal grandsires. Rankings on predictions of breeding values were the same regardless of whether inbreeding coefficients for animal and dam were included in the models. Heritability estimates were similar regardless of whether inbreeding effects were in the model. Models 3 and 9 best fit the data for estimation of variances and covariances for direct, maternal genetic, and permanent environmental effects. Other models resulted in changes in ranking for predicted breeding values and for estimates of direct and maternal heritability. Heritability estimates of direct effects were smallest with sire and sire-maternal grandsire models.

Journal ArticleDOI
TL;DR: Meuwissen et al. as mentioned in this paper applied a linear mixed model assuming heterogeneous residual variances and known constant variance ratios to the analysis of milk, fat, and protein yields, and fat and protein contents in the French Holstein, Montbeliarde, and Normande dairy cattle populations.

Journal ArticleDOI
TL;DR: In this article, the authors compare the random effects approach with the generalized estimating equation (GEE) approach and conclude that the GEE approach is inappropriate for assessing the treatment effects for these data.
Abstract: The generalized estimating equation (GEE) approach to the analysis of longitudinal data has many attractive robustness properties and can provide a 'population average' characterization of interest, for example, to clinicians who have to treat patients on the basis of their observed characteristics. However, these methods have limitations which restrict their usefulness in both the social and the medical sciences. This conclusion is based on the premise that the main motivations for longitudinal analysis are insight into microlevel dynamics and improved control for omitted or unmeasured variables. We claim that to address these issues a properly formulated random-effects model is required. In addition to a theoretical assessment of some of the issues, we illustrate this by reanalysing data on polyp counts. In this example, the covariates include a base-line outcome, and the effectiveness of the treatment seems to vary by base-line. We compare the random-effects approach with the GEE approach and conclude that the GEE approach is inappropriate for assessing the treatment effects for these data.

Journal ArticleDOI
TL;DR: The proposed model fits to a longitudinal randomized trial of a cardiovascular educational program where the responses of interest are change in hypertension and hypercholestemia status and is compared to a naive bivariate model that assumes independence between time points and univariate mixed effects logit models.
Abstract: When two binary responses are measured for each study subject across time, it may be of interest to model how the bivariate associations and marginal univariate risks involving the two responses change across time. To achieve such a goal, marginal models with bivariate log odds ratio and univariate logit components are extended to include random effects for all components. Specifically, separate normal random effects are specified on the log odds ratio scale for bivariate responses and on the logit scale for univariate responses. Assuming conditional independence given the random effects facilitates the modeling of bivariate associations across time with missing at random incomplete data. We fit the model to a dataset for which such structures are feasible: a longitudinal randomized trial of a cardiovascular educational program where the responses of interest are change in hypertension and hypercholestemia status. The proposed model is compared to a naive bivariate model that assumes independence between time points and univariate mixed effects logit models.

Journal ArticleDOI
TL;DR: The results showed that common environmental effects and non-additive genetic effects were more important sources of variability than maternal genetic effects, and high variability in parameter estimates for replicate populations was demonstrated.

Journal ArticleDOI
TL;DR: The authors proposed a method of inference for generalized linear mixed models (GLMM) that in many ways resembles the method of least squares and showed that adequate inference about GLMM can be made based on the conditional likelihood on a subset of the random effects.
Abstract: We propose a method of inference for generalized linear mixed models (GLMM) that in many ways resembles the method of least squares. We also show that adequate inference about GLMM can be made based on the conditional likelihood on a subset of the random effects. One of the important features of our methods is that they rely on weak distributional assumptions about the random effects. The methods proposed are also computationally feasible. Asymptotic behavior of the estimates is investigated. In particular, consistency is proved under reasonable conditions.

01 Jan 1999
TL;DR: In this article, the authors used restricted maximum likelihood (REML) procedures to estimate heritability and genetic correlations for direct and maternal genetic effects using two underlying scales (probit and logit) for lamb survival in order to implement a new procedure of genetic evaluation.
Abstract: Lamb survival is a trait of economic importance in sheep production. Losses may be attributed to the lamb or to the behaviour of the dam. Maternal effects are environmental for the offspring but are at least partly genetic for the dam. The genetic evaluation of survival is complicated by the fact that this trait has a binomial distribution whereas most analytical procedures assume normality. The objective of this study was to estimate heritabilities and genetic correlations for direct and maternal genetic effects using two underlying scales (probit and logit) for lamb survival in order to implement a new procedure of genetic evaluation. Variance components were obtained by Restricted Maximum Likelihood (REML) procedures after transformation of the data using a linear mixed model. Records of survival were from 25,874 lambs born over the period 1989 to 1995 in a prolific Romney flock representing the progeny of 218 sires and 6,771 dams. The model included fixed effects of sex, year, litter size at birth and age of dam and the random effects for direct genetic, maternal genetic and maternal environmental influences. Lambing and survival-to-weaning percentages were 178% and 87%, respectively. The estimates of heritability on the logit scale were 0.01, 0.03 and 0.04 for direct, maternal and total genetic effects, respectively. The proportion of total variance explained by maternal environmental influences was 0.09. The estimated genetic correlation between direct and maternal genetic effects was n0.26. Similar estimates were obtained on the probit scale. Estimates will be lower for the observed scale. These results indicate a greater opportunity for improving survival by manipulating the environment rather than by selection. However survival analysis is an important component of quality assurance in a genetic improvement scheme even if the resultant breeding values are not used for direct selection.

Journal ArticleDOI
TL;DR: Here, it is argued for the superiority of random effects models in which parameters are regarded as random variables, with distributions governed by hyperparameters describing the patterns of interest.
Abstract: Wildlife management is increasingly guided by analyses of large and complex datasets. The description of such datasets often requires a large number of parameters, among which certain patterns might be discernible. For example, one may consider a long-term study producing estimates of annual survival rates; of interest is the question whether these rates have declined through time. Several statistical methods exist for examining pattern in collections of parameters. Here, I argue for the superiority of random effects models in which parameters are regarded as random variables, with distributions governed by hyperparameters describing the patterns of interest. Unfortunately, implementation of random effects models is sometimes difficult. Ultrastructural models, in which the postulated pattern is built into the parameter structure of the original data analysis, are approximations to random effects models. However, this approximation is not completely satisfactory: failure to account for natural variation among parameters can lead to overstatement of the evidence for pattern among parameters. I describe quasi-likelihood methods that can be used to improve the approximation of random effects models by ultrastructural models.

Journal ArticleDOI
TL;DR: This paper argues for the general use of random effect models, and illustrates the value of non-parametric maximum likelihood (NPML) analysis of multi-centre trials.
Abstract: The meta-analysis of multi-centre trials can be based on either fixed or random effect models. This paper argues for the general use of random effect models, and illustrates the value of non-parametric maximum likelihood (NPML) analysis of such trials. The same general approach unifies administrative 'league table' analyses in epidemiological and other studies. Several examples of the NPML analysis are given, including a 70-centre trial.

Journal ArticleDOI
TL;DR: In this article, a version of the nonlinear mixed-effects model is presented that allows random effects only on the linear coefficients, and data that are missing at random are allowed on the repeated measures or on the observed variables of the factor analysis submodel.
Abstract: A version of the nonlinear mixed-effects model is presented that allows random effects only on the linear coefficients. Nonlinear parameters are not stochastic. In nonlinear regression, this kind of model has been called conditionally linear. As a mixed-effects model, this structure is more flexible than the popular linear mixed-effects model, while being nearly as straightforward to estimate. In addition to the structure for the repeated measures, a latent variable model (Browne, 1993) is specified for a distinct set of covariates that are related to the random effects in the second level. Unbalanced data are allowed on the repeated measures, and data that are missing at random are allowed on the repeated measures or on the observed variables of the factor analysis submodel. Features of the model are illustrated by two examples. Multilevel models are widely used to study the effects of treatments or to characterize differences between intact groups in designs where individuals are hierarchically nested within random levels of a second variable. Comprehensive reviews of this methodology with emphasis on cluster sampling problems have been presented by Bock (1989), Bryk and Raudenbush (1992), and Goldstein (1987). Essentially the same technology is applied in the analysis of repeated measures. Instead of a model for subjects selected from units of an organization, the prototypical repeated measures design is a series of measurements for a particular individual randomly selected from a population. Recent texts by Crowder and Hand (1990), Davidian and Giltinan (1995), Diggle, Liang, and Zeger (1994), and Vonesh and Chinchilli (1997) contain overviews of this approach, including a variety of extensions and case studies. In the repeated measures context, the model is often called a mixed-effects model. Several characteristics make this model attractive for the study of repeated measures: (a) It allows for individual functions to differ from the mean function over the population of subjects, yet characterizes both population and individual patterns as members of a single response function; (b) Subjects can be measured This research was supported in part by National Institute of Mental Health grant MH 5-4576. The authors thank Dr. Scott Chaiken of the Armstrong Laboratory, Brooks Air Force Base, for generous permission to use the data of the second example.