scispace - formally typeset
Search or ask a question

Showing papers in "Biometrika in 2005"


Journal ArticleDOI
TL;DR: In this paper, the authors derived an analytical expression for the concordance probability in the Cox proportional hazards model, which is a function of the regression parameters and covariate distribution only and does not use the observed event and censoring times.
Abstract: SUMMARY The concordance probability is used to evaluate the discriminatory power and the predictive accuracy of nonlinear statistical models. We derive an analytical expression for the concordance probability in the Cox proportional hazards model. The proposed estimator is a function of the regression parameters and the covariate distribution only and does not use the observed event and censoring times. For this reason it is asymptotically unbiased, unlike Harrell's c-index based on informative pairs. The asymptotic distribution of the concordance probability estimate is derived using U-statistic theory and the methodology is applied to a predictive model in lung cancer.

575 citations


Journal ArticleDOI
TL;DR: The conditional Akaike information (CAIC) as discussed by the authors was proposed for both maximum likelihood and residual maximum likelihood estimation of linear mixed-effects models in the analysis of clustered data, and the penalty term in CAIC is related to the effective degrees of freedom p for a linear mixed model proposed by Hodges & Sargent (2001); p reflects an intermediate level of complexity between a fixed-effects model with no cluster effect and a corresponding model with fixed cluster effects.
Abstract: SUMMARY This paper focuses on the Akaike information criterion, AIC, for linear mixed-effects models in the analysis of clustered data. We make the distinction between questions regarding the population and questions regarding the particular clusters in the data. We show that the AIC in current use is not appropriate for the focus on clusters, and we propose instead the conditional Akaike information and its corresponding criterion, the conditional AIC, CAIC. The penalty term in CAIC is related to the effective degrees of freedom p for a linear mixed model proposed by Hodges & Sargent (2001); p reflects an intermediate level of complexity between a fixed-effects model with no cluster effect and a corresponding model with fixed cluster effects. The CAIC is defined for both maximum likelihood and residual maximum likelihood estimation. A pharmacokinetics data appli cation is used to illuminate the distinction between the two inference settings, and to illustrate the use of the conditional AIC in model selection.

559 citations


Journal ArticleDOI
TL;DR: In this paper, the authors show that for any model selection criterion to be consistent, it must behave suboptimally for estimating the regression function in terms of minimax rate of covergence; and Bayesian model averaging cannot be minimax-rate optimal for regression estimation.
Abstract: A traditional approach to statistical inference is to identify the true or best model first with little or no consideration of the specific goal of inference in the model identification stage. Can the pursuit of the true model also lead to optimal regression estimation? In model selection, it is well known that BIC is consistent in selecting the true model, and AIC is minimax-rate optimal for estimating the regression function. A recent promising direction is adaptive model selection, in which, in contrast to AIC and BIC, the penalty term is data-dependent. Some theoretical and empirical results have been obtained in support of adaptive model selection, but it is still not clear if it can really share the strengths of AIC and BIC. Model combining or averaging has attracted increasing attention as a means to overcome the model selection uncertainty. Can Bayesian model averaging be optimal for estimating the regression function in a minimax sense? We show that the answers to these questions are basically in the negative: for any model selection criterion to be consistent, it must behave suboptimally for estimating the regression function in terms of minimax rate of covergence; and Bayesian model averaging cannot be minimax-rate optimal for regression estimation.

419 citations


Journal ArticleDOI
TL;DR: This paper introduces an information criterion for model selection based on composite likelihood, and describes applications to the modelling of time series of counts through dynamic generalised linear models and to the analysis of the well-known Old Faithful dataset.
Abstract: A composite likelihood consists of a combination of valid likelihood objects, usually related to small subsets of data. The merit of composite likelihood is to reduce the computational complexity so that it is possible to deal with large datasets and very complex models, even when the use of standard likelihood or Bayesian methods is not feasible. In this paper, we aim to suggest an integrated, general approach to inference and model selection using composite likelihood methods. In particular, we introduce an information criterion for model selection based on composite likelihood. We also describe applications to the modelling of time series of counts through dynamic generalised linear models and to the analysis of the well-known Old Faithful geyser dataset.

360 citations


Journal ArticleDOI
TL;DR: In this paper, the problem of estimating the maximum likelihood of gene-environment associations with disease when genetic and environmental exposures can be assumed to be independent in the underlying population was considered.
Abstract: We consider the problem of maximum-likelihood estimation in case-control studies of gene-environment associations with disease when genetic and environmental exposures can be assumed to be independent in the underlying population. Traditional logistic regression analysis may not be efficient in this setting. We study the semiparametric maximum likelihood estimates of logistic regression parameters that exploit the gene-environment independence assumption and leave the distribution of the environmental exposures to be nonparametric. We use a profile-likelihood technique to derive a simple algorithm for obtaining the estimator and we study the asymptotic theory. The results are extended to situations where genetic and environmental factors are independent conditional on some other factors. Simulation studies investigate small-sample properties. The method is illustrated using data from a case-control study designed to investigate the interplay of BRCA1/2 mutations and oral contraceptive use in the aetiology of ovarian cancer.

220 citations


Journal ArticleDOI
TL;DR: In this paper, a non-parametric Bayesian procedure for moment condition models is proposed, where the probability weights are obtained via exponential tilting and the prior preference is given to distributions having a small support and among those sharing the same support.
Abstract: SUMMARY While empirical likelihood has been shown to exhibit many of the properties of conven tional parametric likelihoods, a formal probabilistic interpretation has so far been lacking. We show that a likelihood function very closely related to empirical likelihood naturally arises from a nonparametric Bayesian procedure which places a type of noninformative prior on the space of distributions. This prior gives preference to distributions having a small support and, among those sharing the same support, it favours entropy-maximising distributions. The resulting nonparametric Bayesian procedure admits a computationally convenient representation as an empirical-likelihood-type likelihood where the probability weights are obtained via exponential tilting. The proposed methodology provides an attractive alternative to the Bayesian bootstrap as a nonparametric limit of a Bayesian procedure for moment condition models.

201 citations


Journal ArticleDOI
TL;DR: In this article, a simple pivotal-based approach that produces prediction intervals and predictive distributions with well-calibrated frequentist probability interpretations is introduced, and efficient simulation methods for producing predictive distributions are considered.
Abstract: SUMMARY We consider parametric frameworks for the prediction of future values of a random variable Y, based on previously observed data X. Simple pivotal methods for obtaining calibrated prediction intervals are presented and illustrated. Frequentist predictive distri butions are defined as confidence distributions, and their utility is demonstrated. A simple pivotal-based approach that produces prediction intervals and predictive distributions with well-calibrated frequentist probability interpretations is introduced, and efficient simulation methods for producing predictive distributions are considered. Properties related to an average Kullback-Leibler measure of goodness for predictive or estimated distributions are given. The predictive distributions here are shown to be optimal in certain settings with invariance structure, and to dominate plug-in distributions under certain conditions.

197 citations


Journal ArticleDOI
TL;DR: In this paper, the Fay-Herriot method was used to estimate the model variance based on weighted residual sum of squares, which is unbiased to second-order mean squared error.
Abstract: SUMMARY In this paper based on a basic area level model we obtain second-order accurate approxi mations to the mean squared error of model-based small area estimators, using the Fay & Herriot (1979) iterative method of estimating the model variance based on weighted residual sum of squares. We also obtain mean squared error estimators unbiased to second order. Based on simulations, we compare the finite-sample performance of our mean squared error estimators with those based on method-of-moments, maximum likelihood and residual maximum likelihood estimators of the model variance. Our results suggest that the Fay-Herriot method performs better, in terms of relative bias of mean squared error estimators, than the other methods across different combinations of number of areas, pattern of sampling variances and distribution of small area effects. We also derive a noninformative prior on the model parameters for which the posterior variance of a small area mean is second-order unbiased for the mean squared error. The posterior variance based on such a prior possesses both Bayesian and frequentist interpretations.

172 citations


Journal ArticleDOI
TL;DR: In this paper, a simple statistic is proposed for testing the complete independence of random variables having a multivariate normal distribution, as both the sample size and the number of variables go to infinity, is shown to be normal.
Abstract: A simple statistic is proposed for testing the complete independence of random variables having a multivariate normal distribution. The asymptotic null distribution of this statistic, as both the sample size and the number of variables go to infinity, is shown to be normal. Consequently, this test can be used when the number of variables is not small relative to the sample size and, in particular, even when the number of variables exceeds the sample size. The finite sample size performance of the normal approximation is evaluated in a simulation study and the results are compared to those of the likelihood ratio test.

165 citations


Journal ArticleDOI
TL;DR: A simple Monte Carlo method for computing the normalising constant of the G-Wishart, which is the distribution with Wishart density with respect to the Lebesgue measure restricted to M-super-p(G), and a way of sampling from the posterior distribution of the precision matrix.
Abstract: SUMMARY A centred Gaussian model that is Markov with respect to an undirected graph G is characterised by the parameter set of its precision matrices which is the cone M+(G) of positive definite matrices with entries corresponding to the missing edges of G constrained to be equal to zero. In a Bayesian framework, the conjugate family for the precision parameter is the distribution with Wishart density with respect to the Lebesgue measure restricted to M+(G). We call this distribution the G-Wishart. When G is nondecomposable, the normalising constant of the G-Wishart cannot be computed in closed form. In this paper, we give a simple Monte Carlo method for computing this normalising constant. The main feature of our method is that the sampling distribution is exact and consists of a product of independent univariate standard normal and chi-squared distributions that can be read off the graph G. Computing this normalising constant is necessary for obtaining the posterior distribution of G or the marginal likelihood of the corresponding graphical Gaussian model. Our method also gives a way of sampling from the posterior distribution of the precision matrix.

165 citations


Journal ArticleDOI
TL;DR: In this article, the authors explore the joint modeling approach under the accelerated failure time assumption when covariates are assumed to follow a linear mixed effects model with measurement errors, and the procedure is based on maximising the joint likelihood function with random effects treated as missing data.
Abstract: SUMMARY The accelerated failure time model is an attractive alternative to the Cox model when the proportionality assumption fails to capture the relationship between the survival time and longitudinal covariates Several complications arise when the covariates are measured intermittently at different time points for different subjects, possibly with measurement errors, or measurements are not available after the failure time Joint modelling of the failure time and longitudinal data offers a solution to such complications We explore the joint modelling approach under the accelerated failure time assumption when covariates are assumed to follow a linear mixed effects model with measurement errors The procedure is based on maximising the joint likelihood function with random effects treated as missing data A Monte Carlo EM algorithm is used to estimate all the unknown parameters, including the unknown baseline hazard function The performance of the proposed procedure is checked in simulation studies A case study of reproductive egg-laying data for female Mediterranean fruit flies and their relationship to longevity demonstrate the effectiveness of the new procedure

Journal ArticleDOI
TL;DR: In this paper, two asymptotic frameworks for obtaining limiting distributions of maximum likelihood estimators of covariance parameters in Gaussian spatial models with or without a nugget effect are presented.
Abstract: SUMMARY Two asymptotic frameworks, increasing domain asymptotics and infill asymptotics, have been advanced for obtaining limiting distributions of maximum likelihood estimators of covariance parameters in Gaussian spatial models with or without a nugget effect. These limiting distributions are known to be different in some cases. It is therefore of interest to know, for a given finite sample, which framework is more appropriate. We consider the possibility of making this choice on the basis of how well the limiting distri butions obtained under each framework approximate their finite-sample counterparts. We investigate the quality of these approximations both theoretically and empirically, showing that, for certain consistently estimable parameters of exponential covariograms, approximations corresponding to the two frameworks perform about equally well. For those parameters that cannot be estimated consistently, however, the infill asymptotic approximation is preferable.

Journal ArticleDOI
TL;DR: In this article, the authors proposed exact likelihood and restricted likelihood ratio tests for testing polynomial regression versus a general alternative modelled by penalised splines, and derived the asymptotic local power properties of the tests under weak conditions.
Abstract: SUMMARY Penalised-spline-based additive models allow a simple mixed model representation where the variance components control departures from linear models. The smoothing parameter is the ratio of the random-coefficient and error variances and tests for linear regression reduce to tests for zero random-coefficient variances. We propose exact likelihood and restricted likelihood ratio tests for testing polynomial regression versus a general alternative modelled by penalised splines. Their spectral decompositions are used as the basis of fast simulation algorithms. We derive the asymptotic local power properties of the tests under weak conditions. In particular we characterise the local alternatives that are detected with asymptotic probability one. Confidence intervals for the smoothing parameter are obtained by inverting the tests for a fixed smoothing parameter versus a general alternative. We discuss F and R tests and show that ignoring the variability in the smoothing parameter estimator can have a dramatic effect on their null distributions. The powers of several known tests are investigated and a small set of tests with good power properties is identified. The restricted likelihood ratio test is among the best in terms of power.

Journal ArticleDOI
TL;DR: In this article, an adaptive Monte Carlo sampling scheme for Bayesian variable selection in linear regression is proposed, which improves on standard Markov chain methods by considering Metropolis-Hastings proposals that make use of accumulated information about the posterior distribution obtained during sampling.
Abstract: Our paper proposes adaptive Monte Carlo sampling schemes for Bayesian variable selection in linear regression that improve on standard Markov chain methods. We do so by considering Metropolis-Hastings proposals that make use of accumulated information about the posterior distribution obtained during sampling. Adaptation needs to be done carefully to ensure that sampling is from the correct ergodic distribution. We give conditions for the validity of an adaptive sampling scheme in this problem, and for simulating from a distribution on a finite state space in general, and suggest a class of adaptive proposal densities which uses best linear prediction to approximate the Gibbs sampler. Our sampling scheme is computationally much faster per iteration than the Gibbs sampler, and when this is taken into account the efficiency gains when using our sampling scheme compared to alternative approaches are substantial in terms of precision of estimation of posterior quantities of interest for a given amount of computation time. We compare our method with other sampling schemes for examples involving both real and simulated data. The methodology developed in the paper can be extended to variable selection in more general problems.

Journal ArticleDOI
TL;DR: In this paper, the effect of the penalty on the bias and variance of the estimator is discussed, both for general splines and for the case of polynomial splines.
Abstract: SUMMARY Penalised spline regression is a popular new approach to smoothing, but its theoretical properties are not yet well understood. In this paper, mean squared error expressions and consistency results are derived by using a white-noise model representation for the estimator. The effect of the penalty on the bias and variance of the estimator is discussed, both for general splines and for the case of polynomial splines. The penalised spline regression estimator is shown to achieve the optimal nonparametric convergence rate established by Stone (1982).

Journal ArticleDOI
TL;DR: The numerical and theoretical results show that the first-order saddlepoint density approximation provides highly accurate approximations in a broad spectrum of cases.
Abstract: The Fisher-Bingham distribution is obtained when a multivariate normal random vector is conditioned to have unit length Its normalising constant can be expressed as an elementary function multiplied by the density, evaluated at 1, of a linear combination of independent noncentral chi(1)(2) random variables Hence we may approximate the normalising constant by applying a saddlepoint approximation to this density Three such approximations, implementation of each of which is straightforward, are investigated: the first-order saddlepoint density approximation, the second-order saddlepoint density approximation and a variant of the second-order approximation which has proved slightly more accurate than the other two The numerical and theoretical results we present show that this approach provides highly accurate approximations in a broad spectrum of cases

Journal ArticleDOI
TL;DR: In this article, the coherence principles may be applied to calibrate a two-stage design and to deal with situations with delayed toxicity, and the authors show examples in which coherence is violated.
Abstract: of the methods. This paper shows examples in which coherence is violated, and discusses how the coherence principles may be applied to calibrate a two-stage design and to deal with situations with delayed toxicity.

Journal ArticleDOI
TL;DR: Under certain regularity conditions, it is demonstrated that, for certain penalty functions with proper choices of regularisation parameters, the resulting estimator can correctly identify the true model, as if it were known in advance.
Abstract: In this paper, we proposed a penalised pseudo-partial likelihood method for variable selection with multivariate failure time data with a growing number of regression coefficients. Under certain regularity conditions, we show the consistency and asymptotic normality of the penalised likelihood estimators. We further demonstrate that, for certain penalty functions with proper choices of regularisation parameters, the resulting estimator can correctly identify the true model, as if it were known in advance. Based on a simple approximation of the penalty function, the proposed method can be easily carried out with the Newton-Raphson algorithm. We conduct extensive Monte Carlo simulation studies to assess the finite sample performance of the proposed procedures. We illustrate the proposed method by analysing a dataset from the Framingham Heart Study.

Journal ArticleDOI
TL;DR: In this article, a pseudo-Bayesian interpretation of standard errors yields a natural induced smoothing of statistical estimating functions, which can be applied to rank estimation, and the lack of smoothness which prevents standard error estimation is remedied.
Abstract: A 'pseudo-Bayesian' interpretation of standard errors yields a natural induced smoothing of statistical estimating functions. When applied to rank estimation, the lack of smoothness which prevents standard error estimation is remedied. Efficiency and robustness are preserved, while the smoothed estimation has excellent computational properties. In particular, convergence of the iterative equation for standard error is fast, and standard error calculation becomes asymptotically a one-step procedure. This property also extends to covariance matrix calculation for rank estimates in multi-parameter problems. Examples, and some simple explanations, are given.

Journal ArticleDOI
TL;DR: In this paper, the authors introduce a two-sample model that accommodates crossing survival curves, where the two scalar parameters of the model have the interpretations of being the short-term and long-term hazard ratios respectively.
Abstract: SUMMARY Standard approaches to semiparametric modelling of two-sample survival data are not appropriate when the two survival curves cross. We introduce a two-sample model that accommodates crossing survival curves. The two scalar parameters of the model have the interpretations of being the short-term and long-term hazard ratios respectively. The time varying hazard ratio is expressed semiparametrically by the two scalar parameters and an unspecified baseline distribution. The new model includes the Cox model and the pro portional odds model as submodels. For inference we use a pseudo maximum likelihood approach that can be expressed via some simple estimating equations, analogous to that for the maximum partial likelihood estimator of the Cox model, that provide consistent and asymptotically normal estimators. Simulation studies show that the estimators perform well for moderate sample sizes. We also illustrate the methods with a real-data example. The new model can be extended easily to the regression setting.

Journal ArticleDOI
TL;DR: The authors employed Lasso shrinkage within the context of sufficient dimension reduction to obtain a shrinkage sliced inverse regression estimator, which provides easier interpretations and better pre diction accuracy without assuming a parametric model.
Abstract: SUMMARY We employ Lasso shrinkage within the context of sufficient dimension reduction to obtain a shrinkage sliced inverse regression estimator, which provides easier interpretations and better pre diction accuracy without assuming a parametric model. The shrinkage sliced inverse regression approach can be employed for both single-index and multiple-index models. Simulation studies suggest that the new estimator performs well when its tuning parameter is selected by either the Bayesian information criterion or the residual information criterion.

Journal ArticleDOI
TL;DR: In this paper, the authors consider mixture models in which the components of data vectors from any given subpopulation are statistically independent, or independent in blocks, and show that if, under this condition of independence, the distributions and mixing proportions can often be estimated root-n consistently.
Abstract: SUMMARY We consider mixture models in which the components of data vectors from any given subpopulation are statistically independent, or independent in blocks. We argue that if, under this condition of independence, we take a nonparametric view of the problem and allow the number of subpopulations to be quite general, the distributions and mixing proportions can often be estimated root-n consistently. Indeed, we show that, if the data are k-variate and there are p subpopulations, then for each p > 2 there is a minimal value of k, kp say, such that the mixture problem is always nonparametrically identifiable, and all distributions and mixture proportions are nonparametrically identifiable when k > kp. We treat the case p = 2 in detail, and there we show how to construct explicit distribution, density and mixture-proportion estimators, converging at conventional rates. Other values of p can be addressed using a similar approach, although the methodology becomes rapidly more complex as p increases.

Journal ArticleDOI
TL;DR: In this paper, a class of estimators based on penalised spline regression is proposed, which are weighted linear combinations of sample observations, with weights calibrated to known control totals.
Abstract: SUMMARY Estimation of finite population totals in the presence of auxiliary information is considered. A class of estimators based on penalised spline regression is proposed. These estimators are weighted linear combinations of sample observations, with weights calibrated to known control totals. They allow straightforward extensions to multiple auxiliary variables and to complex designs. Under standard design conditions, the estimators are design consistent and asymptotically normal, and they admit consistent variance estimation using familiar design-based methods. Data-driven penalty selection is considered in the context of unequal probability sampling designs. Simulation experiments show that the estimators are more efficient than parametric regression estimators when the parametric model is incorrectly specified, while being approximately as efficient when the parametric specification is correct. An example using Forest Health Monitoring survey data from the U.S. Forest Service demonstrates the applicability of the methodology in the context of a two-phase survey with multiple auxiliary variables.

Journal ArticleDOI
TL;DR: In this article, a general dimension reduction method that combines the ideas of likelihood, correlation, inverse regression and information theory is proposed, which does not require that the dependence be confined to particular conditional moments, nor do restrictions on the predictors or on the regression that are necessary for methods like ordinary least squares and sliced-inverse regression.
Abstract: We propose a general dimension-reduction method that combines the ideas of likelihood, correlation, inverse regression and information theory. We do not require that the dependence be confined to particular conditional moments, nor do we place restrictions on the predictors or on the regression that are necessary for methods like ordinary least squares and sliced-inverse regression. Although we focus on single-index regressions, the underlying idea is applicable more generally. Illustrative examples are presented.

Journal ArticleDOI
TL;DR: In this paper, the covariance between two variables in a multivariate Gaussian distribution is decomposed into a sum of path weights for all paths connecting the two variables, which are useful in determining which variables are important in mediating correlation between the two path endpoints.
Abstract: The covariance between two variables in a multivariate Gaussian distribution is decomposed into a sum of path weights for all paths connecting the two variables in an undirected independence graph These weights are useful in determining which variables are important in mediating correlation between the two path endpoints The decomposition arises in undirected Gaussian graphical models and does not require or involve any assumptions of causality This covariance decomposition is derived using basic linear algebra The decomposition is feasible for very large numbers of variables if the corresponding precision matrix is sparse, a circumstance that arises in examples such as gene expression studies in functional genomics Additional computational efficiences are possible when the undirected graph is derived from an acyclic directed graph

Journal ArticleDOI
TL;DR: In this article, the authors used counting process theory to develop semiparametric inference procedures for the regression coefficients of the Oakes-Dasu model and applied it to the well-known Veterans' Administration lung cancer survival data.
Abstract: SUMMARY As function of time t, a mean residual life is the remaining life expectancy of a subject given survival up to t. The proportional mean residual life model, proposed by Oakes & Dasu (1990), provides an alternative to the Cox proportional hazards model for studying the association between survival times and covariates. In the presence of censoring, we use counting process theory to develop semiparametric inference procedures for the regression coefficients of the Oakes-Dasu model. Simulation studies and an application to the well-known Veterans' Administration lung cancer survival data are presented.

Journal ArticleDOI
TL;DR: In this paper, the authors show how to incorporate extra information into maximum likelihood procedures for multivariate data that can be regarded as the componentwise maxima of some unobserved underlying multivariate process.
Abstract: Multivariate extreme value distributions arise as the limiting distributions of normalised componentwise maxima. They are often used to model multivariate data that can be regarded as the componentwise maxima of some unobserved underlying multivariate process. In many applications we have extra information. We often know the locations of the maxima within the underlying process. If the process is temporal this knowledge is frequently available through the dates on which the maxima are recorded. We show how to incorporate this extra information into maximum likelihood procedures. Asymptotic and small-sample efficiency results are presented for the dependence parameter in the logistic parametric sub-class of bivariate extreme value distributions. We conclude with an application to sea levels.

Journal ArticleDOI
TL;DR: In this paper, the error variance is estimated as the intercept in a simple linear regression model with squared differences of paired observations as the dependent variable and squared distances between the paired covariates as the regressor.
Abstract: SUMMARY We propose a new estimator for the error variance in a nonparametric regression model. We estimate the error variance as the intercept in a simple linear regression model with squared differences of paired observations as the dependent variable and squared distances between the paired covariates as the regressor. For the special case of a one-dimensional domain with equally spaced design points, we show that our method reaches an asymptotic optimal rate which is not achieved by some existing methods. We conduct extensive simulations to evaluate finite-sample performance of our method and compare it with existing methods. Our method can be extended to nonparametric regression models with multivariate functions defined on arbitrary subsets of normed spaces, possibly observed on unequally spaced or clustered designed points.

Journal ArticleDOI
TL;DR: In this article, a covariate adjustment method is proposed for situations where both predictors and response in a regression model are not directly observable, but are contaminated with a multiplicative factor that is determined by the value of an unknown function of an observable covariate.
Abstract: mue11er@wa1d.ucdavis.edu SUMMARY We introduce covariate-adjusted regression for situations where both predictors and response in a regression model are not directly observable, but are contaminated with a multiplicative factor that is determined by the value of an unknown function of an observable covariate. We demonstrate how the regression coefficients can be estimated by establishing a connection to varying-coefficient regression. The proposed covariate adjustment method is illustrated with an analysis of the regression of plasma fibrinogen concentration as response on serum transferrin level as predictor for 69 haemodialysis patients. In this example, both response and predictor are thought to be influenced in a multiplicative fashion by body mass index. A bootstrap hypothesis test enables us to test the significance of the regression parameters. We establish consistency and convergence rates of the parameter estimators for this new covariate-adjusted regression model. Simulation studies demonstrate the efficacy of the proposed method.

Journal ArticleDOI
TL;DR: This work displays a set of estimated neuronal Poisson-process intensity functions where the first eigenvalue of a sample covariance matrix computed from estimated functions may be biased upwards, and discusses two methods for accounting for estimation variation.
Abstract: SUMMARY In many applications of functional data analysis, summarising functional variation based on fits, without taking account of the estimation process, runs the risk of attributing the estimation variation to the functional variation, thereby overstating the latter. For example, the first eigenvalue of a sample covariance matrix computed from estimated functions may be biased upwards. We display a set of estimated neuronal Poisson-process intensity functions where this bias is substantial, and we discuss two methods for account ing for estimation variation. One method uses a random-coefficient model, which requires all functions to be fitted with the same basis functions. An alternative method removes the same-basis restriction by means of a hierarchical Gaussian process model. In a small simulation study the hierarchical Gaussian process model outperformed the random coefficient model and greatly reduced the bias in the estimated first eigenvalue that would result from ignoring estimation variability. For the neuronal data the hierarchical Gaussian process estimate of the first eigenvalue was much smaller than the naive estimate that ignored variability due to function estimation. The neuronal setting also illustrates the benefit of incorporating alignment parameters into the hierarchical scheme.