scispace - formally typeset
Search or ask a question

Showing papers in "Journal of The Royal Statistical Society Series B-statistical Methodology in 2001"


Journal ArticleDOI
TL;DR: In this paper, the authors proposed a method called the "gap statistic" for estimating the number of clusters (groups) in a set of data, which uses the output of any clustering algorithm (e.g. K-means or hierarchical), comparing the change in within-cluster dispersion with that expected under an appropriate reference null distribution.
Abstract: We propose a method (the ‘gap statistic’) for estimating the number of clusters (groups) in a set of data. The technique uses the output of any clustering algorithm (e.g. K-means or hierarchical), comparing the change in within-cluster dispersion with that expected under an appropriate reference null distribution. Some theory is developed for the proposal and a simulation study shows that the gap statistic usually outperforms other methods that have been proposed in the literature.

4,283 citations


Journal ArticleDOI
TL;DR: A Bayesian calibration technique which improves on this traditional approach in two respects and attempts to correct for any inadequacy of the model which is revealed by a discrepancy between the observed data and the model predictions from even the best‐fitting parameter values is presented.
Abstract: We consider prediction and uncertainty analysis for systems which are approximated using complex mathematical models. Such models, implemented as computer codes, are often generic in the sense that by a suitable choice of some of the model's input parameters the code can be used to predict the behaviour of the system in a variety of specific applications. However, in any specific application the values of necessary parameters may be unknown. In this case, physical observations of the system in the specific context are used to learn about the unknown parameters. The process of fitting the model to the observed data by adjusting the parameters is known as calibration. Calibration is typically effected by ad hoc fitting, and after calibration the model is used, with the fitted input values, to predict the future behaviour of the system. We present a Bayesian calibration technique which improves on this traditional approach in two respects. First, the predictions allow for all sources of uncertainty, including the remaining uncertainty over the fitted parameters. Second, they attempt to correct for any inadequacy of the model which is revealed by a discrepancy between the observed data and the model predictions from even the best-fitting parameter values. The method is illustrated by using data from a nuclear radiation release at Tomsk, and from a more complex simulated nuclear accident exercise.

3,745 citations


Journal ArticleDOI
TL;DR: The authors construct continuous time stochastic volatility models for financial assets where the volatility processes are superpositions of positive Ornstein-Uhlenbeck (OU) processes, and study these models in relation to financial data and theory.
Abstract: Non-Gaussian processes of Ornstein–Uhlenbeck (OU) type offer the possibility of capturing important distributional deviations from Gaussianity and for flexible modelling of dependence structures. This paper develops this potential, drawing on and extending powerful results from probability theory for applications in statistical analysis. Their power is illustrated by a sustained application of OU processes within the context of finance and econometrics. We construct continuous time stochastic volatility models for financial assets where the volatility processes are superpositions of positive OU processes, and we study these models in relation to financial data and theory.

1,991 citations


Journal ArticleDOI
TL;DR: This work proposes a new technique for tracking moving target distributions, known as particle filters, which does not suffer from a progressive degeneration as the target sequence evolves.
Abstract: Markov chain Monte Carlo (MCMC) sampling is a numerically intensive simulation technique which has greatly improved the practicality of Bayesian inference and prediction. However, MCMC sampling is too slow to be of practical use in problems involving a large number of posterior (target) distributions, as in dynamic modelling and predictive model selection. Alternative simulation techniques for tracking moving target distributions, known as particle filters, which combine importance sampling, importance resampling and MCMC sampling, tend to suffer from a progressive degeneration as the target sequence evolves. We propose a new technique, based on these same simulation methodologies, which does not suffer from this progressive degeneration.

828 citations


Journal ArticleDOI
TL;DR: Gaussian Markov random fields (conditional autoregressions) can be sampled quickly by using numerical techniques for sparse matrices, and its use by constructing efficient block updates in Markov chain Monte Carlo algorithms for disease mapping is demonstrated.
Abstract: This paper demonstrates how Gaussian Markov random fields (conditional autoregressions) can be sampled quickly by using numerical techniques for sparse matrices. The algorithm is general and efficient, and expands easily to various forms for conditional simulation and evaluation of normalization constants. We demonstrate its use by constructing efficient block updates in Markov chain Monte Carlo algorithms for disease mapping.

395 citations


Journal ArticleDOI
TL;DR: FLDA can be used to produce classifications on new (test) curves, give an estimate of the discriminant function between classes and provide a one‐ or two‐dimensional pictorial representation of a set of curves.
Abstract: We introduce a technique for extending the classical method of linear discriminant analysis (LDA) to data sets where the predictor variables are curves or functions. This procedure, which we call functional linear discriminant analysis (FLDA), is particularly useful when only fragments of the curves are observed. All the techniques associated with LDA can be extended for use with FLDA. In particular FLDA can be used to produce classifications on new (test) curves, give an estimate of the discriminant function between classes and provide a one- or two-dimensional pictorial representation of a set of curves. We also extend this procedure to provide generalizations of quadratic and regularized discriminant analysis.

309 citations


Journal ArticleDOI
TL;DR: In this article, a modified LRT for homogeneity in finite mixture models with a general parametric kernel distribution family is proposed, which has a X2-type of null limiting distribution and is asymptotic most powerful under local alternatives.
Abstract: Summary. Testing for homogeneity in finite mixture models has been investigated by many researchers. The asymptotic null distribution of the likelihood ratio test (LRT) is very complex and difficult to use in practice. We propose a modified LRT for homogeneity in finite mixture models with a general parametric kernel distribution family. The modified LRT has a X2-type of null limiting distribution and is asymptotically most powerful under local alternatives. Simulations show that it performs better than competing tests. They also reveal that the limiting distribution with some adjustment can satisfactorily approximate the quantiles of the test statistic, even for moderate sample sizes.

237 citations


Journal ArticleDOI
TL;DR: A flexible class of Cox processes whose stochastic intensity is a space–time Ornstein–Uhlenbeck process is described, and moment‐based methods of parameter estimation are developed.
Abstract: Space–time point pattern data have become more widely available as a result of technological developments in areas such as geographic information systems. We describe a flexible class of space–time point processes. Our models are Cox processes whose stochastic intensity is a space–time Ornstein–Uhlenbeck process. We develop moment-based methods of parameter estimation, show how to predict the underlying intensity by using a Markov chain Monte Carlo approach and illustrate the performance of our methods on a synthetic data set.

221 citations


Journal ArticleDOI
TL;DR: In this paper, the mean function at each time period was modeled as a locally weighted mixture of linear regressions, and the regression coefficients were allowed to change through time, allowing for temporal variation, such as trends, seasonal effects and autoregressions.
Abstract: We propose a model for non-stationary spatiotemporal data. To account for spatial variability, we model the mean function at each time period as a locally weighted mixture of linear regressions. To incorporate temporal variation, we allow the regression coefficients to change through time. The model is cast in a Gaussian state space framework, which allows us to include temporal components such as trends, seasonal effects and autoregressions, and permits a fast implementation and full probabilistic inference for the parameters, interpolations and forecasts. To illustrate the model, we apply it to two large environmental data sets: tropical rainfall levels and Atlantic Ocean temperatures.

219 citations


Journal ArticleDOI
TL;DR: The authors proposed a method to assess the local influence in a minor perturbation of a statistical model with incomplete data using Cook's approach to the conditional expectation of the complete data log-likelihood function in the EM algorithm.
Abstract: This paper proposes a method to assess the local influence in a minor perturbation of a statistical model with incomplete data. The idea is to utilize Cook's approach to the conditional expectation of the complete-data log-likelihood function in the EM algorithm. It is shown that the method proposed produces analytic results that are very similar to those obtained from a classical local influence approach based on the observed data likelihood function and has the potential to assess a variety of complicated models that cannot be handled by existing methods. An application to the generalized linear mixed model is investigated. Some illustrative artificial and real examples are presented.

172 citations


Journal ArticleDOI
TL;DR: In this article, a parametric inverse regression (PIR) is proposed to estimate the dimension of a regression at the outset of an analysis, where smooth parametric curves are fitted to the p inverse regressions via a multivariate linear model.
Abstract: A new estimation method for the dimension of a regression at the outset of an analysis is proposed. A linear subspace spanned by projections of the regressor vector X, which contains part or all of the modelling information for the regression of a vector Y on X, and its dimension are estimated via the means of parametric inverse regression. Smooth parametric curves are fitted to the p inverse regressions via a multivariate linear model. No restrictions are placed on the distribution of the regressors. The estimate of the dimension of the regression is based on optimal estimation procedures. A simulation study shows the method to be more powerful than sliced inverse regression in some situations.

Journal ArticleDOI
TL;DR: In this article, a new approach is suggested for choosing the threshold when fitting the Hill estimator of a tail exponent to extreme value data, based on an easily computed diagnostic, which in turn is founded directly on the hill estimator itself, "symmetrized" to remove the effect of the tail exponent but designed to emphasize biases in estimates of that exponent.
Abstract: Summary. A new approach is suggested for choosing the threshold when fitting the Hill estimator of a tail exponent to extreme value data. Our method is based on an easily computed diagnostic, which in turn is founded directly on the Hill estimator itself, 'symmetrized' to remove the effect of the tail exponent but designed to emphasize biases in estimates of that exponent. The attractions of the method are its accuracy, its simplicity and the generality with which it applies. This generality implies that the technique has somewhat different goals from more conventional approaches, which are designed to accommodate the minor component of a postulated two-component Pareto mixture. Our approach does not rely on the second component being Pareto distributed. Nevertheless, in the conventional setting it performs competitively with recently proposed methods, and in more general cases it achieves optimal rates of convergence. A by-product of our development is a very simple and practicable exponential approximation to the distribution of the Hill estimator under departures from the Pareto distribution.

Journal ArticleDOI
TL;DR: In this article, the authors consider a sequence of posterior distributions based on a data-dependent prior (which they shall refer to as a pseudoposterior distribution) and establish simple conditions under which the sequence is Hellinger consistent.
Abstract: We consider a sequence of posterior distributions based on a data-dependent prior (which we shall refer to as a pseudoposterior distribution) and establish simple conditions under which the sequence is Hellinger consistent. It is shown how investigations into these pseudo posteriors assist with the understanding of some true posterior distributions, including Polya trees, the infinite dimensional exponential family and mixture models.

Journal ArticleDOI
TL;DR: In this paper, a new Fourier-von Mises image model is identified, with phase differences between Fouriertransformed images having von Mises distributions, and null set distortion criteria are proposed, with each criterion uniquely minimized by a particular set of polynomial functions.
Abstract: A warping is a function that deforms images by mapping between image domains. The choice of function is formulated statistically as maximum penalized likelihood, where the likelihood measures the similarity between images after warping and the penalty is a measure of distortion of a warping. The paper addresses two issues simultaneously, of how to choose the warping function and how to assess the alignment. A new, Fourier–von Mises image model is identified, with phase differences between Fourier-transformed images having von Mises distributions. Also, new, null set distortion criteria are proposed, with each criterion uniquely minimized by a particular set of polynomial functions. A conjugate gradient algorithm is used to estimate the warping function, which is numerically approximated by a piecewise bilinear function. The method is motivated by, and used to solve, three applied problems: to register a remotely sensed image with a map, to align microscope images obtained by using different optics and to discriminate between species of fish from photographic images.

Journal ArticleDOI
TL;DR: In this paper, a class of cohort sampling designs, including nested case control, case-cohort and classical case-control, is studied through a unified approach using Cox's proportional hazards model.
Abstract: A class of cohort sampling designs, including nested case–control, case–cohort and classical case–control designs involving survival data, is studied through a unified approach using Cox’s proportional hazards model. By finding an optimal sample reuse method via local averaging, a closed form estimating function is obtained, leading directly to the estimators of the regression parameters that are relatively easy to compute and are more efficient than some commonly used estimators in case–cohort and nested case–control studies. A semiparametric efficient estimator can also be found with some further computation. In addition, the class of sampling designs in this study provides a variety of sampling options and relaxes the restrictions of sampling schemes that are currently available.

Journal ArticleDOI
TL;DR: A two‐stage algorithm for computing maximum likelihood estimates for a class of spatial models that combines Markov chain Monte Carlo methods, and stochastic approximation methods such as the off‐line average and adaptive search direction is proposed.
Abstract: We propose a two-stage algorithm for computing maximum likelihood estimates for a class of spatial models. The algorithm combines Markov chain Monte Carlo methods such as the Metropolis-Hastings-Green algorithm and the Gibbs sampler, and stochastic approximation methods such as the off-line average and adaptive search direction. A new criterion is built into the algorithm so stopping is automatic once the desired precision has been set. Simulation studies and applications to some real data sets have been conducted with three spatial models. We compared the algorithm proposed with a direct application of the classical Robbins-Monro algorithm using Wiebe's wheat data and found that our procedure is at least 15 times faster.

Journal ArticleDOI
TL;DR: In this article, a general approach to local sensitivity analysis for selectivity bias is developed, which aims to study the sensitivity of inference to small departures from tacit assumptions of ignorability or randomness.
Abstract: Summary. Observational data analysis is often based on tacit assumptions of ignorability or randomness. The paper develops a general approach to local sensitivity analysis for selectivity bias, which aims to study the sensitivity of inference to small departures from such assumptions. If M is a model assuming ignorability, we surround M by a small neighbourhood At defined in the sense of Kullback-Leibler divergence and then compare the inference for models in IV with that for M. Interpretable bounds for such differences are developed. Applications to missing data and to observational comparisons are discussed. Local approximations to sensitivity analysis are model robust and can be applied to a wide range of statistical problems.

Journal ArticleDOI
TL;DR: In this article, the authors derive an infinite dimensional score equation and suggest an algorithm to estimate the shape function for a simple shape invariant model, but unlike in the usual kernel smoothing situation we do not need to select a bandwidth or even a kernel function, since the score equation automatically selects the shape and the smoothing parameter for the estimation.
Abstract: The analysis of a sample of curves can be done by self-modelling regression methods. Within this framework we follow the ideas of nonparametric maximum likelihood estimation known from event history analysis and the counting process set-up. We derive an infinite dimensional score equation and from there we suggest an algorithm to estimate the shape function for a simple shape invariant model. The nonparametric maximum likelihood estimator that we find turns out to be a Nadaraya–Watson-like estimator, but unlike in the usual kernel smoothing situation we do not need to select a bandwidth or even a kernel function, since the score equation automatically selects the shape and the smoothing parameter for the estimation. We apply the method to a sample of electrophoretic spectra to illustrate how it works.

Journal ArticleDOI
TL;DR: In this paper, a method of constructing E(s 2 )-optimal supersaturated designs was presented which allows a reasonably complete solution to be found for various numbers of runs n including n = 8, 12, 16, 20, 24, 32, 40, 48, 64.
Abstract: There has been much recent interest in supersaturated designs and their application in factor screening experiments. Supersaturated designs have mainly been constructed by using the E(s 2 )-optimality criterion originally proposed by Booth and Cox in 1962. However, until now E(s 2 )-optimal designs have only been established with certainty for n experimental runs when the number of factors m is a multiple of n - 1, and in adjacent cases where m = q(n - 1) + r (|r| ≤ 2, q an integer). A method of constructing E(s 2 )-optimal designs is presented which allows a reasonably complete solution to be found for various numbers of runs n including n = 8, 12, 16, 20, 24, 32, 40, 48, 64.

Journal ArticleDOI
TL;DR: In this article, a perfect version of the slice sampler can be easily implemented, at least when the target distribution is bounded, by exploiting the monotonicity properties of slice samplers.
Abstract: Perfect sampling allows the exact simulation of random variables from the stationary measure of a Markov chain. By exploiting monotonicity properties of the slice sampler we show that a perfect version of the algorithm can be easily implemented, at least when the target distribution is bounded. Various extensions, including perfect product slice samplers, and examples of applications are discussed.

Journal ArticleDOI
TL;DR: In this article, the authors consider regression parameter estimation in the Cox failure time model when regression variables are subject to measurement error and propose a risk set regression calibration estimator, which is based on least squares calibration in each risk set.
Abstract: Regression parameter estimation in the Cox failure time model is considered when regression variables are subject to measurement error. Assuming that repeat regression vector measurements adhere to a classical measurement model, we can consider an ordinary regression calibration approach in which the unobserved covariates are replaced by an estimate of their conditional expectation given available covariate measurements. However, since the rate of withdrawal from the risk set across the time axis, due to failure or censoring, will typically depend on covariates, we may improve the regression parameter estimator by recalibrating within each risk set. The asymptotic and small sample properties of such a risk set regression calibration estimator are studied. A simple estimator based on a least squares calibration in each risk set appears able to eliminate much of the bias that attends the ordinary regression calibration estimator under extreme measurement error circumstances. Corresponding asymptotic distribution theory is developed, small sample properties are studied using computer simulations and an illustration is provided.

Journal ArticleDOI
TL;DR: In this article, a Bayesian analysis of a piecewise linear model constructed using basis functions is presented, where prior distributions are adopted on both the number and the locations of the splines, leading to a model averaging approach to prediction with predictive distributions that take into account model uncertainty.
Abstract: Summary. We present a Bayesian analysis of a piecewise linear model constructed using basis functions which generalizes the univariate linear spline to higher dimensions. Prior distributions are adopted on both the number and the locations of the splines, which leads to a model averaging approach to prediction with predictive distributions that take into account model uncertainty. Conditioning on the data produces a Bayes local linear model with distributions on both predictions and local linear parameters. The method is spatially adaptive and covariate selection is achieved by using splines of lower dimension than the data.

Journal ArticleDOI
TL;DR: In this paper, two strategies for group screening are presented for a large number of factors, over two stages of experimentation, with particular emphasis on the detection of interactions, and results are derived on the relationship between the grouped and individual factorial effects, and the probability distributions of the numbers of grouped factors whose main effects or interactions are declared active at the first stage.
Abstract: One of the main advantages of factorial experiments is the information that they can offer on interactions. When there are many factors to be studied, some or all of this information is often sacrificed to keep the size of an experiment economically feasible. Two strategies for group screening are presented for a large number of factors, over two stages of experimentation, with particular emphasis on the detection of interactions. One approach estimates only main effects at the first stage (classical group screening), whereas the other new method (interaction group screening) estimates both main effects and key two-factor interactions at the first stage. Three criteria are used to guide the choice of screening technique, and also the size of the groups of factors for study in the first-stage experiment. The criteria seek to minimize the expected total number of observations in the experiment, the probability that the size of the experiment exceeds a prespecified target and the proportion of active individual factorial effects which are not detected. To implement these criteria, results are derived on the relationship between the grouped and individual factorial effects, and the probability distributions of the numbers of grouped factors whose main effects or interactions are declared active at the first stage. Examples are used to illustrate the methodology, and some issues and open questions for the practical implementation of the results are discussed.

Journal ArticleDOI
Fushing Hsieh1
TL;DR: In this article, a class of non-proportional hazards regression models is considered to have hazard specifications consisting of a power form of cross-effects on the base-line hazard function.
Abstract: Summary. A class of non-proportional hazards regression models is considered to have hazard specifications consisting of a power form of cross-effects on the base-line hazard function. The primary goal of these models is to deal with settings in which heterogeneous distribution shapes of survival times may be present in populations characterized by some observable covariates. Although effects of such heterogeneity can be explicitly seen through crossing cumulative hazards phenomena in k-sample problems, they are barely visible in a one-sample regression setting. Hence, heterogeneity of this kind may not be noticed and, more importantly, may result in severely misleading inference. This is because the partial likelihood approach cannot eliminate the unknown cumulative base-line hazard functions in this setting. For coherent statistical inferences, a system of martingale processes is taken as a basis with which, together with the method of sieves, an overidentified estimating equation approach is proposed. A Pearson X2 type of goodness-of-fit testing statistic is derived as a by-product. An example with data on gastric cancer patients' survival times is analysed.

Journal ArticleDOI
TL;DR: In this article, the authors formulate subsampling estimators of the moments of general statistics computed from marked point process data, and establish their L 2 -consistency, which can be used for the construction of confidence intervals for estimated parameters.
Abstract: In spatial statistics the data typically consist of measurements of some quantity at Irregularly scattered locations; in other words, the data form a realization of a marked point process. In this paper, we formulate subsampling estimators of the moments of general statistics computed from marked point process data, and we establish their L 2 -consistency. The variance estimator in particular can be used for the construction of confidence intervals for estimated parameters. A practical data-based method for choosing a subsampling parameter is given and illustrated on a data set. Finite sample simulation examples are also presented.

Journal ArticleDOI
TL;DR: In this paper, a combination of Gaussian estimation and generalized estimating equations is proposed to improve the asymptotic behavior of estimators by using the older approach, implemented via Gaussian estimator.
Abstract: In recent years various sophisticated methods have been developed for the analysis of repeated measures, or longitudinal data. The more traditional approach, based on a normal likelihood function, has been shown to be unsatisfactory, in the sense of yielding asymptotically biased estimates when the covariance structure is misspecified. More recent methodology, based on generalized linear models and quasi-likelihood estimation, has gained widespread acceptance as ‘generalized estimating equations’. However, this also has theoretical problems. In this paper a suggestion is made for improving the asymptotic behaviour of estimators by using the older approach, implemented via Gaussian estimation. The resulting estimating equations include the quasi-score function as one component, so the methodology proposed can be viewed as a combination of Gaussian estimation and generalized estimating equations which has a firmer asymptotic basis than either alone has.

Journal ArticleDOI
TL;DR: It is concluded that the new generalized autoregressive conditional heteroscedastic (GARCH) model with tree‐structured multiple thresholds for the estimation of volatility in financial time series has better predictive potential than other approaches.
Abstract: We propose a new generalized autoregressive conditional heteroscedastic (GARCH) model with tree-structured multiple thresholds for the estimation of volatility in financial time series. The approach relies on the idea of a binary tree where every terminal node parameterizes a (local) GARCH model for a partition cell of the predictor space. The fitting of such trees is constructed within the likelihood framework for non-Gaussian observations: it is very different from the well-known regression tree procedure which is based on residual sums of squares. Our strategy includes the classical GARCH model as a special case and allows us to increase model complexity in a systematic and flexible way. We derive a consistency result and conclude from simulation and real data analysis that the new method has better predictive potential than other approaches.

Journal ArticleDOI
TL;DR: In this article, the authors considered response-biased sampling, in which a subject is observed with a probability which is a function of its response, and derived a semiparametric maximum likelihood estimate of 6, along with its asymptotic normality, efficiency and variance estimates.
Abstract: Summary. Suppose that subjects in a population follow the model f(y*lx*; 6) where y* denotes a response, x* denotes a vector of covariates and 6 is the parameter to be estimated. We consider response-biased sampling, in which a subject is observed with a probability which is a function of its response. Such response-biased sampling frequently occurs in econometrics, epidemiology and survey sampling. The semiparametric maximum likelihood estimate of 6 is derived, along with its asymptotic normality, efficiency and variance estimates. The estimate proposed can be used as a maximum partial likelihood estimate in stratified response-selective sampling. Some computation algorithms are also provided.

Journal ArticleDOI
TL;DR: This paper examined the asymptotic and small sample properties of model-based and robust tests of the null hypothesis of no randomized treatment effect based on the partial likelihood arising from an arbitrarily misspecified Cox proportional hazards model.
Abstract: We examine the asymptotic and small sample properties of model-based and robust tests of the null hypothesis of no randomized treatment effect based on the partial likelihood arising from an arbitrarily misspecified Cox proportional hazards model. When the distribution of the censoring variable is either conditionally independent of the treatment group given covariates or conditionally independent of covariates given the treatment group, the numerators of the partial likelihood treatment score and Wald tests have asymptotic mean equal to 0 under the null hypothesis, regardless of whether or how the Cox model is misspecified. We show that the model-based variance estimators used in the calculation of the model-based tests are not, in general, consistent under model misspecification, yet using analytic considerations and simulations we show that their true sizes can be as close to the nominal value as tests calculated with robust variance estimators. As a special case, we show that the model-based log-rank test is asymptotically valid. When the Cox model is misspecified and the distribution of censoring depends on both treatment group and covariates, the asymptotic distributions of the resulting partial likelihood treatment score statistic and maximum partial likelihood estimator do not, in general, have a zero mean under the null hypothesis. Here neither the fully model-based tests, including the log-rank test, nor the robust tests will be asymptotically valid, and we show through simulations that the distortion to test size can be substantial.

Journal ArticleDOI
TL;DR: In this paper, the authors compare the finite sample efficiencies of OLS, GLS, and incorrect GLS estimators and prove new theorems establishing theoretical efficiency bounds for IGLS relative to GLS and OLS.
Abstract: Summary. The regression literature contains hundreds of studies on serially correlated disturbances. Most of these studies assume that the structure of the error covariance matrix Q is known or can be estimated consistently from data. Surprisingly, few studies investigate the properties of estimated generalized least squares (GLS) procedures when the structure of Q is incorrectly identified and the parameters are inefficiently estimated. We compare the finite sample efficiencies of ordinary least squares (OLS), GLS and incorrect GLS (IGLS) estimators. We also prove new theorems establishing theoretical efficiency bounds for IGLS relative to GLS and OLS. Results from an exhaustive simulation study are used to evaluate the finite sample performance and to demonstrate the robustness of IGLS estimates vis-a-vis OLS and GLS estimates constructed for models with known and estimated (but correctly identified) Q. Some of our conclusions for finite samples differ from established asymptotic results.