scispace - formally typeset
Search or ask a question

Showing papers on "Random effects model published in 2006"


Journal ArticleDOI
TL;DR: New procedures for evaluating direct, indirect, and total effects in multilevel models when all relevant variables are measured at Level 1 and all effects are random are proposed.
Abstract: The authors propose new procedures for evaluating direct, indirect, and total effects in multilevel models when all relevant variables are measured at Level 1 and all effects are random. Formulas are provided for the mean and variance of the indirect and total effects and for the sampling variances of the average indirect and total effects. Simulations show that the estimates are unbiased under most conditions. Confidence intervals based on a normal approximation or a simulated sampling distribution perform well when the random effects are normally distributed but less so when they are nonnormally distributed. These methods are further developed to address hypotheses of moderated mediation in the multilevel context. An example demonstrates the feasibility and usefulness of the proposed methods.

1,375 citations


Posted Content
TL;DR: This article proposed a new variance estimator for OLS as well as for nonlinear estimators such as logit, probit and GMM, that provcides cluster-robust inference when there is two-way or multi-way clustering that is non-nested.
Abstract: In this paper we propose a new variance estimator for OLS as well as for nonlinear estimators such as logit, probit and GMM, that provcides cluster-robust inference when there is two-way or multi-way clustering that is non-nested. The variance estimator extends the standard cluster-robust variance estimator or sandwich estimator for one-way clustering (e.g. Liang and Zeger (1986), Arellano (1987)) and relies on similar relatively weak distributional assumptions. Our method is easily implemented in statistical packages, such as Stata and SAS, that already offer cluster-robust standard errors when there is one-way clustering. The method is demonstrated by a Monte Carlo analysis for a two-way random effects model; a Monte Carlo analysis of a placebo law that extends the state-year effects example of Bertrand et al. (2004) to two dimensions; and by application to two studies in the empirical public/labor literature where two-way clustering is present.

923 citations


Journal ArticleDOI
TL;DR: A simulation approach was used to clarify the application of random effects under three common situations for telemetry studies and found that random intercepts accounted for unbalanced sample designs, and models withrandom intercepts and coefficients improved model fit given the variation in selection among individuals and functional responses in selection.
Abstract: 1. Resource selection estimated by logistic regression is used increasingly in studies to identify critical resources for animal populations and to predict species occurrence. 2. Most frequently, individual animals are monitored and pooled to estimate population-level effects without regard to group or individual-level variation. Pooling assumes that both observations and their errors are independent, and resource selection is constant given individual variation in resource availability. 3. Although researchers have identified ways to minimize autocorrelation, variation between individuals caused by differences in selection or available resources, including functional responses in resource selection, have not been well addressed. 4. Here we review random-effects models and their application to resource selection modelling to overcome these common limitations. We present a simple case study of an analysis of resource selection by grizzly bears in the foothills of the Canadian Rocky Mountains with and without random effects. 5. Both categorical and continuous variables in the grizzly bear model differed in interpretation, both in statistical significance and coefficient sign, depending on how a random effect was included. We used a simulation approach to clarify the application of random effects under three common situations for telemetry studies: (a) discrepancies in sample sizes among individuals; (b) differences among individuals in selection where availability is constant; and (c) differences in availability with and without a functional response in resource selection. 6. We found that random intercepts accounted for unbalanced sample designs, and models with random intercepts and coefficients improved model fit given the variation in selection among individuals and functional responses in selection. Our empirical example and simulations demonstrate how including random effects in resource selection models can aid interpretation and address difficult assumptions limiting their generality. This approach will allow researchers to appropriately estimate marginal (population) and conditional (individual) responses, and account for complex grouping, unbalanced sample designs and autocorrelation.

718 citations


Book
13 Jul 2006
TL;DR: In this paper, the authors proposed an extended framework for estimating the likelihood of fixed parameters using a mixture of conditional and conditional likelihoods, which is derived from the profile likelihood distribution of the likelihood-ratio statistic distribution.
Abstract: LIST OF NOTATIONS PREFACE INTRODUCTION CLASSICAL LIKELIHOOD THEORY Definition Quantities derived from the likelihood Profile likelihood Distribution of the likelihood-ratio statistic Distribution of the MLE and the Wald statistic Model selection Marginal and conditional likelihoods Higher-order approximations Adjusted profile likelihood Bayesian and likelihood methods Jacobian in likelihood methods GENERALIZED LINEAR MODELS Linear models Generalized linear models Model checking Examples QUASI-LIKELIHOOD Examples Iterative weighted least squares Asymptotic inference Dispersion models Extended Quasi-likelihood Joint GLM of mean and dispersion Joint GLMs for quality improvement EXTENDED LIKELIHOOD INFERENCES Two kinds of likelihood Inference about the fixed parameters Inference about the random parameters Optimality in random-parameter estimation Canonical scale, h-likelihood and joint inference Statistical prediction Regression as an extended model Missing or incomplete-data problems Is marginal likelihood enough for inference about fixed parameters? Summary: likelihoods in extended framework NORMAL LINEAR MIXED MODELS Developments of normal mixed linear models Likelihood estimation of fixed parameters Classical estimation of random effects H-likelihood approach Example Invariance and likelihood inference HIERARCHICAL GLMS HGLMs H-likelihood Inferential procedures using h-likelihood Penalized quasi-likelihood Deviances in HGLMs Examples Choice of random-effect scale HGLMS WITH STRUCTURED DISPERSION HGLMs with structured dispersion Quasi-HGLMs Examples CORRELATED RANDOM EFFECTS FOR HGLMS HGLMs with correlated random effects Random effects described by fixed L matrices Random effects described by a covariance matrix Random effects described by a precision matrix Fitting and model-checking Examples Twin and family data Ascertainment problem SMOOTHING Spline models Mixed model framework Automatic smoothing Non-Gaussian smoothing RANDOM-EFFECT MODELS FOR SURVIVAL DATA Proportional-hazard model Frailty models and the associated h-likelihood *Mixed linear models with censoring Extensions Proofs DOUBLE HGLMs DHGLMs Models for finance data H-likelihood procedure for fitting DHGLMs Random effects in the ? component Examples FURTHER TOPICS Model for multivariate responses Joint model for continuous and binary data Joint model for repeated measures and survival time Missing data in longitudinal studies Denoising signals by imputation REFERENCE DATA INDEX AUTHOR INDEX SUBJECT INDEX

495 citations


Journal ArticleDOI
TL;DR: The authors developed a mixed (fixed and random effects) models approach to the age-period-cohort (AFC) analysis of micro data sets in the form of a series of repeated cross-section sample surveys that are increasingly available to sociologists.
Abstract: We develop a mixed (fixed and random effects) models approach to the age-period-cohort (AFC) analysis of micro data sets in the form of a series of the repeated cross-section sample surveys that are increasingly available to sociologists. This approach recognizes the multilevel structure of the individual-level responses. As a substantive illustration, we apply our proposed methodology to data on verbal test scores from 15 cross-sections of the General Social Survey, 1974–2000. These data have been the subject of recent debates in the sociological literature. We show how our approach can be used to shed new light on these debates by identifying and estimating age, period, and cohort components of change.

364 citations


Journal ArticleDOI
TL;DR: It is demonstrated that making distribution models spatially explicit can be essential for accurately characterizing the environmental response of species, predicting their probability of occurrence, and assessing uncertainty in the model results.
Abstract: Models of the geographic distributions of species have wide application in ecology. But the nonspatial, single-level, regression models that ecologists have often employed do not deal with problems of irregular sampling intensity or spatial dependence, and do not adequately quantify uncertainty. We show here how to build statistical models that can handle these features of spatial prediction and provide richer, more powerful inference about species niche relations, distributions, and the effects of human disturbance. We begin with a familiar generalized linear model and build in additional features, including spatial random effects and hierarchical levels. Since these models are fully specified sta- tistical models, we show that it is possible to add complexity without sacrificing inter- pretability. This step-by-step approach, together with attached code that implements a simple, spatially explicit, regression model, is structured to facilitate self-teaching. All models are developed in a Bayesian framework. We assess the performance of the models by using them to predict the distributions of two plant species (Proteaceae) from South Africa's Cape Floristic Region. We demonstrate that making distribution models spatially explicit can be essential for accurately characterizing the environmental response of species, predicting their probability of occurrence, and assessing uncertainty in the model results. Adding hierarchical levels to the models has further advantages in allowing human trans- formation of the landscape to be taken into account, as well as additional features of the sampling process.

352 citations


Journal ArticleDOI
TL;DR: The h‐likelihood provides a unified framework for this new class of models and gives a single algorithm for fitting all members of the class, which will enable models with heavy‐tailed distributions to be explored and provide robust estimation against outliers.
Abstract: Summary. We propose a class of double hierarchical generalized linear models in which random effects can be specified for both the mean and dispersion. Heteroscedasticity between clusters can be modelled by introducing random effects in the dispersion model, as is heterogeneity between clusters in the mean model. This class will, among other things, enable models with heavy-tailed distributions to be explored, providing robust estimation against outliers. The h-likelihood provides a unified framework for this new class of models and gives a single algorithm for fitting all members of the class. This algorithm does not require quadrature or prior probabilities.

322 citations


Journal ArticleDOI
TL;DR: This paper illustrates the use of Proc MIXED of the SAS system to implement REML estimation of genotypic and phenotypic correlations and a method to obtain approximate parametric estimates of the sampling variances of the correlation estimates is presented.
Abstract: Plant breeders traditionally have estimated genotypic and phenotypic correlations between traits using the method of moments on the basis of a multivariate analysis of variance (MANOVA). Drawbacks of using the method of moments to estimate variance and covariance components include the possibility of obtaining estimates outside of parameter bounds, reduced estimation efficiency, and ignorance of the estimators' distributional properties when data are missing. An alternative approach that does not suffer these problems, but depends on the assumption of normally distributed random effects and large sample sizes, is restricted maximum likelihood (REML). This paper illustrates the use of Proc MIXED of the SAS system to implement REML estimation of genotypic and phenotypic correlations. Additionally, a method to obtain approximate parametric estimates of the sampling variances of the correlation estimates is presented. MANOVA and REML methods were compared with a real data set and with simulated data. The simulation study examined the effects of different correlation parameter values, genotypic and environmental sample sizes, and proportion of missing data on Type I and Type II error rates and on accuracy of confidence intervals. The two methods provided similar results when data were balanced or only 5% of data were missing. However, when 15 or 25% data were missing, the REML method generally performed better, resulting in higher power of detection of correlations and more accurate 95% confidence intervals. Samples of at least 75 genotypes and two environments are recommended to obtain accurate confidence intervals using the proposed method.

317 citations


Posted Content
TL;DR: For example, overid computes versions of a test of overidentifying restrictions (orthogonality conditions) for a panel data estimation as mentioned in this paper, where the test statistic is distributed as chi-squared with degrees of freedom = L-K, where L is the number of excluded instruments and K is the total number of regressors.
Abstract: xtoverid computes versions of a test of overidentifying restrictions (orthogonality conditions) for a panel data estimation. For an instrumental variables estimation, this is a test of the null hypothesis that the excluded instruments are valid instruments, i.e., uncorrelated with the error term and correctly excluded from the estimated equation. The test statistic is distributed as chi-squared with degrees of freedom = L-K, where L is the number of excluded instruments and K is the number of regressors, and a rejection casts doubt on the validity of the instruments. xtoverid will report tests of overidentifying restrictions after IV estimation using fixed effects, first differences, random effects, and the Hausman-Taylor estimator. A test of fixed vs. random effects is also a test of overidentifying restrictions, and xtoverid will report this test after a standard panel data estimation with xtreg,re. This routine is now included in the overid package.

253 citations


Journal ArticleDOI
TL;DR: Disease-mapping models for areal data often have fixed effects to measure the effect of spatially varying covariates and random effects with a conditionally autoregressive (CAR) prior to account for spatial clustering, but adding the CAR random effects can cause large changes in the posterior mean and variance of fixed effects compared to the nonspatial regression model.
Abstract: Disease-mapping models for areal data often have fixed effects to measure the effect of spatially varying covariates and random effects with a conditionally autoregressive (CAR) prior to account for spatial clustering. In such spatial regressions, the objective may be to estimate the fixed effects while accounting for the spatial correlation. But adding the CAR random effects can cause large changes in the posterior mean and variance of fixed effects compared to the nonspatial regression model. This article explores the impact of adding spatial random effects on fixed effect estimates and posterior variance. Diagnostics are proposed to measure posterior variance inflation from collinearity between the fixed effect covariates and the CAR random effects and to measure each region's influence on the change in the fixed effect's estimates by adding the CAR random effects. A new model that alleviates the collinearity between the fixed effect covariates and the CAR random effects is developed and extensions of these methods to point-referenced data models are discussed.

249 citations


01 Jan 2006
TL;DR: In this paper, the authors describe the REML-E-BLUP method and illustrate the method with some data on soil water content that exhibit a pronounced spatial trend, which is a special case of the linear mixed model where our data are modelled as the additive combination of fixed effects (e.g. the unknown mean, coefficients of a trend model), random effects (the spatially dependent random variation in the geostatistical context) and independent random error (nugget variation in geostatsistics).
Abstract: Geostatistical estimates of a soil property by kriging are equivalent to the best linear unbiased predictions (BLUPs). Universal kriging is BLUP with a fixed-effect model that is some linear function of spatial coordinates, or more generally a linear function of some other secondary predictor variable when it is called kriging with external drift. A problem in universal kriging is to find a spatial variance model for the random variation, since empirical variograms estimated from the data by method-of-moments will be affected by both the random variation and that variation represented by the fixed effects. The geostatistical model of spatial variation is a special case of the linear mixed model where our data are modelled as the additive combination of fixed effects (e.g. the unknown mean, coefficients of a trend model), random effects (the spatially dependent random variation in the geostatistical context) and independent random error (nugget variation in geostatistics). Statisticians use residual maximum likelihood (REML) to estimate variance parameters, i.e. to obtain the variogram in a geostatistical context. REML estimates are consistent (they converge in probability to the parameters that are estimated) with less bias than both maximum likelihood estimates and method-of-moment estimates obtained from residuals of a fitted trend. If the estimate of the random effects variance model is inserted into the BLUP we have the empirical BLUP or E-BLUP. Despite representing the state of the art for prediction from a linear mixed model in statistics, the REML-E-BLUP has not been widely used in soil science, and in most studies reported in the soils literature the variogram is estimated with methods that are seriously biased if the fixed-effect structure is more complex than just an unknown constant mean (ordinary kriging). In this paper we describe the REML-E-BLUP and illustrate the method with some data on soil water content that exhibit a pronounced spatial trend.

Journal ArticleDOI
TL;DR: In this paper, the authors describe the REML-E-BLUP method and illustrate the method with some data on soil water content that exhibit a pronounced spatial trend, which is a special case of the linear mixed model where our data are modelled as the additive combination of fixed effects (e.g. the unknown mean, coefficients of a trend model), random effects (the spatially dependent random variation in the geostatistical context) and independent random error (nugget variation in geostatsistics).
Abstract: Geostatistical estimates of a soil property by kriging are equivalent to the best linear unbiased predictions (BLUPs). Universal kriging is BLUP with a fixed-effect model that is some linear function of spatial coordinates, or more generally a linear function of some other secondary predictor variable when it is called kriging with external drift. A problem in universal kriging is to find a spatial variance model for the random variation, since empirical variograms estimated from the data by method-of-moments will be affected by both the random variation and that variation represented by the fixed effects. The geostatistical model of spatial variation is a special case of the linear mixed model where our data are modelled as the additive combination of fixed effects (e.g. the unknown mean, coefficients of a trend model), random effects (the spatially dependent random variation in the geostatistical context) and independent random error (nugget variation in geostatistics). Statisticians use residual maximum likelihood (REML) to estimate variance parameters, i.e. to obtain the variogram in a geostatistical context. REML estimates are consistent (they converge in probability to the parameters that are estimated) with less bias than both maximum likelihood estimates and method-of-moment estimates obtained from residuals of a fitted trend. If the estimate of the random effects variance model is inserted into the BLUP we have the empirical BLUP or E-BLUP. Despite representing the state of the art for prediction from a linear mixed model in statistics, the REML-E-BLUP has not been widely used in soil science, and in most studies reported in the soils literature the variogram is estimated with methods that are seriously biased if the fixed-effect structure is more complex than just an unknown constant mean (ordinary kriging). In this paper we describe the REML-E-BLUP and illustrate the method with some data on soil water content that exhibit a pronounced spatial trend.

Journal ArticleDOI
TL;DR: The M-quantile model as mentioned in this paper is based on modeling quantile-like parameters of the conditional distribution of the target variable given the covariates, which avoids the problems associated with specification of random effects, allowing inter-domain differences to be characterized by the variation of area-specific Mquantile coefficients.
Abstract: Small area estimation techniques are employed when sample data are insufficient for acceptably precise direct estimation in domains of interest. These techniques typically rely on regression models that use both covariates and random effects to explain variation between domains. However, such models also depend on strong distributional assumptions, require a formal specification of the random part of the model and do not easily allow for outlier robust inference. We describe a new approach to small area estimation that is based on modelling quantile-like parameters of the conditional distribution of the target variable given the covariates. This avoids the problems associated with specification of random effects, allowing inter-domain differences to be characterized by the variation of area-specific M-quantile coefficients. The proposed approach is easily made robust against outlying data values and can be adapted for estimation of a wide range of area specific parameters, including that of the quantiles of the distribution of the target variable in the different small areas. Results from two simulation studies comparing the performance of the M-quantile modelling approach with more traditional mixed model approaches are also provided.

Journal ArticleDOI
TL;DR: A pairwise approach in which all possible bivariate models are fitted, and where inference follows from pseudo-likelihood arguments is proposed, applicable for linear, generalized linear, and nonlinear mixed models, or for combinations of these.
Abstract: A mixed model is a flexible tool for joint modeling purposes, especially when the gathered data are unbalanced. However, computational problems due to the dimension of the joint covariance matrix of the random effects arise as soon as the number of outcomes and/or the number of used random effects per outcome increases. We propose a pairwise approach in which all possible bivariate models are fitted, and where inference follows from pseudo-likelihood arguments. The approach is applicable for linear, generalized linear, and nonlinear mixed models, or for combinations of these. The methodology will be illustrated for linear mixed models in the analysis of 22-dimensional, highly unbalanced, longitudinal profiles of hearing thresholds.

Journal ArticleDOI
TL;DR: Insight is provided into the robustness property of the MLEs against departure from the normal random effects assumption and the difficulty of reliable estimates for the standard errors is suggested by using bootstrap procedures.
Abstract: The maximum likelihood approach to jointly model the survival time and its longitudinal covariates has been successful to model both processes in longitudinal studies. Random effects in the longitudinal process are often used to model the survival times through a proportional hazards model, and this invokes an EM algorithm to search for the maximum likelihood estimates (MLEs). Several intriguing issues are examined here, including the robustness of the MLEs against departure from the normal random effects assumption, and difficulties with the profile likelihood approach to provide reliable estimates for the standard error of the MLEs. We provide insights into the robustness property and suggest to overcome the difficulty of reliable estimates for the standard errors by using bootstrap procedures. Numerical studies and data analysis illustrate our points.

Journal ArticleDOI
TL;DR: A class of multi-level ZIP regression model with random effects is presented to account for the preponderance of zero counts and the inherent correlation of observations and application to the analysis of correlated count data from a longitudinal infant feeding study illustrates the usefulness of the approach.
Abstract: Count data with excess zeros relative to a Poisson distribution are common in many biomedical applications. A popular approach to the analysis of such data is to use a zero-inflated Poisson (ZIP) regression model. Often, because of the hierarchical study design or the data collection procedure, zero-inflation and lack of independence may occur simultaneously, which render the standard ZIP model inadequate. To account for the preponderance of zero counts and the inherent correlation of observations, a class of multi-level ZIP regression model with random effects is presented. Model fitting is facilitated using an expectation-maximization algorithm, whereas variance components are estimated via residual maximum likelihood estimating equations. A score test for zero-inflation is also presented. The multi-level ZIP model is then generalized to cope with a more complex correlation structure. Application to the analysis of correlated count data from a longitudinal infant feeding study illustrates the usefulness of the approach.

Journal ArticleDOI
TL;DR: The authors showed that conditional maximum likelihood can eliminate this bias by partitioning the covariate into between-and within-cluster components and models that include separate terms for these components also eliminate the source of the bias.
Abstract: Summary. We consider the situation where the random effects in a generalized linear mixed model may be correlated with one of the predictors, which leads to inconsistent estimators. We show that conditional maximum likelihood can eliminate this bias. Conditional likelihood leads naturally to the partitioning of the covariate into between- and within-cluster components and models that include separate terms for these components also eliminate the source of the bias. Another viewpoint that we develop is the idea that many violations of the assumptions (including correlation between the random effects and a covariate) in a generalized linear mixed model may be cast as misspecified mixing distributions. We illustrate the results with two examples and simulations.

Journal ArticleDOI
TL;DR: A multivariate mixed effects model is presented to explicitly capture two different sources of dependence among longitudinal measures over time as well as dependence between different variables in cancer and AIDS clinical trials.
Abstract: Joint modeling of longitudinal and survival data is becoming increasingly essential in most cancer and AIDS clinical trials. We propose a likelihood approach to extend both longitudinal and survival components to be multidimensional. A multivariate mixed effects model is presented to explicitly capture two different sources of dependence among longitudinal measures over time as well as dependence between different variables. For the survival component of the joint model, we introduce a shared frailty, which is assumed to have a positive stable distribution, to induce correlation between failure times. The proposed marginal univariate survival model, which accommodates both zero and nonzero cure fractions for the time to event, is then applied to each marginal survival function. The proposed multivariate survival model has a proportional hazards structure for the population hazard, conditionally as well as marginally, when the baseline covariates are specified through a specific mechanism. In addition, the model is capable of dealing with survival functions with different cure rate structures. The methodology is specifically applied to the International Breast Cancer Study Group (IBCSG) trial to investigate the relationship between quality of life, disease-free survival, and overall survival.

Journal ArticleDOI
TL;DR: Results indicated that when the structure is ignored, fixed-effect estimates were unaffected, but standard error estimates associated with the variables modeled incorrectly were biased.
Abstract: Cross-classified random effects modeling (CCREM) is used to model multilevel data from nonhierarchical contexts. These models are widely discussed but infrequently used in social science research. Because little research exists assessing when it is necessary to use CCREM, 2 studies were conducted. A real data set with a cross-classified structure was analyzed by comparing parameter estimates when ignoring versus modeling the cross-classified data structure. A follow-up simulation study investigated potential factors affecting the need to use CCREM. Results indicated that when the structure is ignored, fixed-effect estimates were unaffected, but standard error estimates associated with the variables modeled incorrectly were biased. Estimates of the variance components also displayed bias, which was related to several study factors.

Journal ArticleDOI
TL;DR: A random-effects model is proposed that provides a unified approach to the clustering of genes with correlated expression levels measured in a wide variety of experimental situations and can be fitted deterministically without the need for time-consuming Monte Carlo approximations.
Abstract: Motivation: The clustering of gene profiles across some experimental conditions of interest contributes significantly to the elucidation of unknown gene function, the validation of gene discoveries and the interpretation of biological processes. However, this clustering problem is not straightforward as the profiles of the genes are not all independently distributed and the expression levels may have been obtained from an experimental design involving replicated arrays. Ignoring the dependence between the gene profiles and the structure of the replicated data can result in important sources of variability in the experiments being overlooked in the analysis, with the consequent possibility of misleading inferences being made. We propose a random-effects model that provides a unified approach to the clustering of genes with correlated expression levels measured in a wide variety of experimental situations. Our model is an extension of the normal mixture model to account for the correlations between the gene profiles and to enable covariate information to be incorporated into the clustering process. Hence the model is applicable to longitudinal studies with or without replication, for example, time-course experiments by using time as a covariate, and to cross-sectional experiments by using categorical covariates to represent the different experimental classes. Results: We show that our random-effects model can be fitted by maximum likelihood via the EM algorithm for which the E(expectation)and M(maximization) steps can be implemented in closed form. Hence our model can be fitted deterministically without the need for time-consuming Monte Carlo approximations. The effectiveness of our model-based procedure for the clustering of correlated gene profiles is demonstrated on three real datasets, representing typical microarray experimental designs, covering time-course, repeated-measurement and cross-sectional data. In these examples, relevant clusters of the genes are obtained, which are supported by existing gene-function annotation. A synthetic dataset is considered too. Availability: A Fortran program blue called EMMIX-WIRE (EM-based MIXture analysis WIth Random Effects) is available on request from the corresponding author. Contact: gjm@maths.uq.edu.au Supplementary information:http://www.maths.uq.edu.au/~gjm/bioinf0602_supp.pdf. Colour versions of Figures 1 and 2 are available as Supplementary material on Bioinformatics online.


01 Jan 2006
TL;DR: In this paper, the authors used an extensive database from the State of Florida to test many of the central assumptions of existing models and determine the impact of alternative methods on measures of teacher quality, finding that the commonly used "restricted value added" or "achievement gain" model is a good approximation of the more cumbersome cumulative achievement model.
Abstract: The recent availability of administrative databases that track individual students and their teachers over time has lead to both a surge in research measuring teacher quality and interest in developing accountability systems for teachers. Existing studies employ a variety of empirical models, yet few studies explicitly state or test the assumptions underlying their models. Using an extensive database from the State of Florida, we test many of the central assumptions of existing models and determine the impact of alternative methods on measures of teacher quality. We find that the commonly used “restricted valueadded” or “achievement-gain” model is a good approximation of the more cumbersome cumulative achievement model. Within the context of the restricted value-added model, we find it is important to control for unmeasured student, teacher and school heterogeneity. Relying on measurable characteristics of students, teachers and schools alone likely produces inconsistent estimates of the effects of teacher characteristics on student achievement. Moreover, individual-specific heterogeneity is more appropriately captured by fixed effects than by random effects; the random effects estimator yields inconsistent parameter estimates and estimates of time-invariant teacher quality that diverge significantly from the fixed effects estimator. In contrast, the exclusion of peer characteristics and class size each have relatively little effect on the estimates of teacher quality. Using aggregated grade-within-school measures of teacher characteristics produces somewhat less precise estimates of the impact of teacher professional development than do measures of the characteristics of specific teachers. Otherwise, aggregation to the grade level doesn’t have a substantial effect. These findings suggest that many models currently employed to measure the impact of teachers on student achievement are mis-specified.

01 Jan 2006
TL;DR: In this article, a thorough treatment of methods for solving over and under-determined systems of equations, e.g., the minimum norm solution method with respect to weighted norms, is presented.
Abstract: This monograph contains a thorough treatment of methods for solving over- and under-determined systems of equations, e.g. the minimum norm solution method with respect to weighted norms. The considered equations can be nonlinear or linear, and deterministic models as well as probabilistic ones are considered. An extensive appendix provides all necessary prerequisites like matrix algebra, matrix analysis and Lagrange multipliers, and a long list of reference is also included.

Book
27 Oct 2006
TL;DR: The residual maximum likelihood (RM) criterion for fitting mixed models has been proposed in this paper, which is based on the best linear unbiased predictors of random effects in mixed models.
Abstract: Preface. 1. The need for more than one random-effect term when fitting a regression line. 2. The need for more than one random-effect term in a designed experiment. 3. Estimation of the variances of random-effect terms. 4. Interval estimates for fixed-effect terms in mixed models. 5. Estimation of random effects in mixed models: best linear unbiased predictors. 6. More advanced mixed models for more elaborate data sets. 7. Two case studies. 8. The use of mixed models for the analysis of unbalanced experimental designs. 9. Beyond mixed modelling. 10. Why is the criterion for fitting mixed models called residual maximum likelihood? References. Index.

Journal ArticleDOI
TL;DR: In this paper, the authors consider consistent estimation of partially linear panel data models with fixed effects and propose profile-likelihood-based estimators for both the parametric and nonparametric components in the models.

Journal ArticleDOI
TL;DR: In this article, a random effects model for PM2.5 concentrations in three midwestern U.S. states for the year 2001 is proposed, which enables full inference with regard to process unknowns as well as predictions in time and space.
Abstract: Studies indicate that even short-term exposure to high concentrations of fine atmospheric particulate matter (PM2.5) can lead to long-term health effects. In this article, we propose a random effects model for PM2.5 concentrations. In particular, we anticipate urban/rural differences with regard to both mean levels and variability. Hence we introduce two random effects components, one for rural or background levels and the other as a supplement for urban areas. These are specified in the form of spatio-temporal processes. Weighting these processes through a population density surface results in nonstationarity in space. We analyze daily PM2.5 concentrations in three midwestern U.S. states for the year 2001. A fully Bayesian model is implemented, using MCMC techniques, which enables full inference with regard to process unknowns as well as predictions in time and space.

Posted Content
TL;DR: In this article, the authors analyze different panel data frontier models in terms of their ability to make this distinction. And they show that Greene's "true" random effects model demonstrates a considerable advantage over other models in separating heterogeneous factors from innate inefficiencies.
Abstract: As the Swiss government seeks to reorganize regional bus services and put operation of lines out for public bid, benchmarking methods are needed to evaluate the bids and the requested subsidies and to adjust the minimum bids needed to win operating rights. That is done by comparing individual operators to "best" observed practice, but to do so accurately, it is necessary to separate out exogenous factors that could contribute to higher costs from the operators' inherent operating efficiencies. The purpose of this study is to analyze different panel data frontier models in terms of their ability to make this distinction. In the sample that is studied, Greene's "true" random effects model demonstrates a considerable advantage over other models in separating heterogeneous factors from innate inefficiencies. Models that do not have this advantage tend to overestimate inefficiencies by operators. The "true" random effects model could be valuable for setting a benchmark in regulating network industries, though care must be taken to account for cost pressures and structures unique to each industry and unaccounted for by the mechanical use of the model.

Journal ArticleDOI
TL;DR: A new Stata command, redpace, is presented and illustrates a new MSL estimator for random-effects dynamic probit models with autocorrelated errors, and using pseudorandom numbers and Halton sequences of quasirandom numbers for MSL estimation of these models.
Abstract: This paper investigates using maximum simulated likelihood (MSL) estimation for random-effects dynamic probit models with autocorrelated errors. It presents and illustrates a new Stata command, redpace, for this estimator. The paper also compares using pseudorandom numbers and Halton sequences of quasirandom numbers for MSL estimation of these models.

Journal Article
TL;DR: The h-likelihood provides a unified framework for this new class of models and gives a single algorithm for fitting all members of the class, which will enable models with heavy-tailed distributions to be explored and provide robust estimation against outliers.
Abstract: We propose a class of double hierarchical generalized linear models in which random effects can be specified for both the mean and dispersion. Heteroscedasticity between clusters can be modelled by introducing random effects in the dispersion model, as is heterogeneity between clusters in the mean model. This class will, among other things, enable models with heavy-tailed distributions to be explored, providing robust estimation against outliers. The h-likelihood provides a unified framework for this new class of models and gives a single algorithm for fitting all members of the class. This algorithm does not require quadrature or prior probabilities.

Journal ArticleDOI
TL;DR: In this article, the authors explored the application of several panel data models in measuring productive efficiency of the electricity distribution sector and showed that alternative panel models such as the true random effects model proposed by Greene (2005) could be used to explore the possible impacts of unobserved firm-specific factors on efficiency estimates.
Abstract: **: This paper explores the application of several panel data models in measuring productive efficiency of the electricity distribution sector. Stochastic Frontier Analysis has been used to estimate the cost-efficiency of 59 distribution utilities operating over a nine-year period in Switzerland. The estimated coefficients and inefficiency scores are compared across three different panel data models. The results indicate that individual efficiency estimates are sensitive to the econometric specification of unobserved firm-specific heterogeneity. This paper shows that alternative panel models such as the ‘true’ random effects model proposed by Greene (2005) could be used to explore the possible impacts of unobserved firm-specific factors on efficiency estimates. When these factors are specified as a separate stochastic term, the efficiency estimates are substantially higher suggesting that conventional models could confound efficiency differences with other unobserved variations among companies. On the other hand, refined specification of unobserved heterogeneity might lead to an underestimation of inefficiencies by mistaking potential persistent inefficiencies as external factors. Given that specification of inefficiency and heterogeneity relies on non-testable assumptions, there is no conclusive evidence in favour of one or the other specification. However, this paper argues that alternative panel data models along with conventional estimators can be used to obtain approximate lower and upper bounds for companies' efficiency scores.