How can the authors draw v from the empirical distribution function of the fitted values?

residual bootstrapping can be used by setting u∗ = D̃−1/2 v∗ and drawing v∗ from the empirical distribution function of the fitted values v̂ = D̃1/2û.

What is the main assumption for penalizing a smooth fit?

Once a basis is selected, a penalized fit is pursued by imposing a penalty on the spline coefficients u and estimating by least squares regression, which results in a ridge regression estimate.

What is the advantage of the mixed model approach?

One advantage of the mixed model approach, as also noted in Ruppert, Wand, and Carroll (2003, ch.6), is that the bias due to smoothing in the smoothing model becomes a component of variance by treating u as random.

How can the mixed model bootstrap be used for smoothing?

The authors have demonstrated how the link between penalized spline smoothing and linear mixed models can not only be exploited for smoothing but also for bootstrapping.

What is the linear unbiased predictor?

Under this model, the estimator µ̂λ can be interpreted as a posterior Bayes estimator or as best linear unbiased predictor (BLUP)5with λ = σ2ε/σ 2 u steering the amount of smoothness.

What is the smoothing parameter used for the mixed model bootstrap?

The smoothing parameter λP is chosen using REML, which provides an easy and numerically appealing choice (see also Ruppert, Wand, and Carroll, 2003, p.113).

How is the mixed model bootstrap performed?

The mixed model bootstrap of û is carried out in two ways, first by simply resampling u∗ from û and secondly by accounting for the correlation structure among û as proposed above.

What is the way to show that the bootstrap distribution is consistent?

It can be shown that the Mean Squared Error based choice of λ has order O(1) (see Kauermann, 2004), so that consistency follows naturally if the smoothing parameter is chosen in a data driven manner, for both pilot and bootstrap versions of λ.

What is the effect of bathrooms with special features?

The effect of bathrooms with special features is positive but less strong and shows some non-significant behavior based on the mixed model bootstrap.

(Open Access) Bootstrapping for Penalized Spline Regression (2009) | Goeran Kauermann

Q: What are the contributions in "Bootstrapping for penalized spline regression b" ?

The authors describe and contrast several different bootstrapping procedures for penalized spline smoothers. The authors describe how bootstrapping methods can be implemented under both frameworks, and they show in theory and through simulations and examples that bootstrapping provides valid inference in both cases.

Bootstrapping for penalized spline regression

Göran Kauermann, Gerda Claeskens and Jean D. Opsomer

DEPARTMENT OF DECISION SCIENCES AND INFORMATION MANAGEMENT (KBI)

Faculty of Economics and Applied Economics

KBI 0609

Bootstrapping for Penalized Spline Regression

∗†‡

G¨oran Kauermann

Universit¨at Bielefeld

Gerda Claeskens

Katholieke Universiteit Leuven

J. D. Opsomer

Iowa State University

14th Februar y 2006

Abstract

We describe and contrast several diﬀerent bootstrapping procedures for penal-

ized spline smoothers. The bootstrapping procedures considered are variations on

existing methods, developed under two diﬀerent probabilistic frameworks. Under

the ﬁrst framework, penalized spline regression is consid ered an estimation tech-

nique to ﬁnd an unknown smooth function. The smooth function is represented

in a high dimensional spline basis, with spline coeﬃcients estimated in a penalized

form. Under the second framework, the unknown fun ction is treated as a realization

of a set of random spline coeﬃcients, which are then predicted in a linear mixed

model. We describe how bootstrapping methods can be implemented under both

frameworks, and we show in theory and through simulations and examples that

bootstrapping provides valid inference in both cases. We compare the inference

obtained under both frameworks, and conclude that the latter generally produces

better results than the former. The bootstrapping ideas are extended to hypothesis

testing, where parametric components in a m odel are tested against nonparametric

alternatives.

∗

Abbreviated title: “Bootstrapping for Penalized Splines.”

†

AMS 199 1 subject classiﬁcations. Primary-62G08; secondary-62G09.

‡

Key words: Mixed Model, Nonparametric Regressio n, Resampling, Nonparametric Hyp othesis Test-

ing.

1 Introduction

The objective of nonparametric regression is to model the mean function of a response

variable Y by some smooth but otherwise unspeciﬁed function µ(x), with x as continuous

covariate. Based o n a sample of data pairs (x

, y

), i = 1, . . . , n, two important classes of

methods for estimating µ(x) are local approaches (see for instance Fan and Gij bels, 1996)

and spline smoothing (see for instance Wahba, 1 992 or Eubank, 1999). Both methods

can be applied in more complex models like Additive Models (Hastie and Tibshirani,

1990), Varying Coeﬃcient Models (Hastie and Tibshirani, 1993) or in generalized response

models (Green and Silverman, 1994 or Bowman and Azzalini, 1997). In recent years,

penalized spline regression (often referred to as P-splines) has received renewed attention

as a powerful alternative smoot hing method. Originally suggested by O’Sullivan (1986),

the method has been made popular by Eilers and Marx (1 996) and more r ecently thro ug h

the book by Ruppert, Wand, and Carroll (2003). The main idea of penalized spline

regression is to ﬁt the function µ(x) parametrically with a suﬃciently ﬂexible spline basis.

Instead of simple parametric estimation, however, a penalty is imposed on the spline

coeﬃcients to achieve a smooth ﬁt. One technical beneﬁt of this approach is that it

reveals a link to linear mixed models (see Wand, 20 03). The resulting aﬃnity to linear

mixed models is advantageous and can be exploited in various ways. In particular, the

smoothing or penalty parameters are playing the role of a ratio of variances in the mixed

model which suggests the application of maximum likelihood theory for estimation (see

for instance Kauermann, 2004).

For notational simplicity, we restrict the presentation to the standard smoothing model

Y = µ(x) + ε with ε as zero mean residuals, even though the examples la t er in this

article mirror more complex models. Estimation of µ(x) is carried out by penalized spline

regression. Under this method, we ﬁrst replace µ(x) by the parametric fo r m Xβ + Zu,

where X is some low dimensional basis, e.g. a line, while Z is high dimensional, e.g. a

basis built from truncated line segments. The main assumption is that Z is suﬃciently

complex and high dimensional, so that the modelling bias µ(x)−(Xβ +Zu) is of ignora ble

size compared to the stochastic estimation error. Theoretical results on how large the

dimension o f the spline basis should be in relation to the sample size are rudimentary,

even though Cardot (2002 ) provides a good starting point. However, it has been found in

practice that the actual speciﬁcation of Z and its dimension has little inﬂuence o n the ﬁt

as long as the dimension of Z is suﬃciently large and a penalized ﬁt is pursued. In fact,

Ruppert (2002) concludes that ”it may be surprising that a default that uses at most 35

or 40 knots [= the dimension of basis Z] could be recommended for eﬀectively all sample

sizes and for all smo oth regression functions without too many oscillations”.

Once a basis is selected, a penalized ﬁt is pursued by imposing a penalty on the spline

coeﬃcients u and estimating by least squares regression, which results in a ridge regression

estimate. The resulting penalized ﬁt is equivalently achieved by assuming the spline

coeﬃcients u to be random, that is formulating an a priori distribution on u. This leads

to a linear mixed model and the best linear unbiased prediction (BLUP) of u is equivalent

to the penalized smooth ﬁt, if the penalty is selected to be equal to the ratio of the

variances of ε and u.

Our objective is to develop a bootstrap that takes advantage of the mixed model structure,

and to compare it with a bootstrap that treats µ(x) as ﬁxed and only ε a s random.

Bootstrapping for such “smoothing models” has a long history, with H¨ardle and Bowman

(1988) and H¨ardle and Marron (1991) as two important examples. See also Mammen

(1993), H¨ardle, Huet, and Jolivet (1995) or Galindo, Liang, Kauermann, and Carroll

(2001) for some extensions. We refer to Shao a nd Tu (1995) for an overview. A major

concern when bootstrapping in smooth models is the bias occurring due to smoothing,

which is not a ccounted fo r if one applies a naive bootstrap. This requires the use o f a pilot

estimate with a relatively large smoo t hing para meter before the actual b ootstrapping is

pursued (see H¨ardle and Marron, 1991). Following the discussion in Ruppert, Wand, a nd

Carroll (200 3, ch.6), we show here t hat the bias problem can be circumvented in penalized

spline smoothing if a mixed mo del for mulation is used for bootstrapping.

We describe a number of boo tstrap versions fo r both the mixed model and the smoothing

model formulations, including simple residual resampling, wild boot strapping and boo t-

strapping of correlated spline coeﬃcients. We also show how residuals can be adjusted

to compensate for any small sample bias. The adjustment again depends on the model

used, that is a smoothing model or a mixed model, respectively. Bootstrapping is em-

ployed in our paper for two purposes. First, it serves to mirror estimation variability.

That is, we derive bootstrap based conﬁdence bands for our smooth ﬁt. Second, we take

advantage of the technique for model validation and model checking. In particular, we

use boo tstrapping for testing of particular comp onents of the model.

The article is organized as follows. In Section 2, we introduce penalized spline smoothing

in the two models considered, i.e. the smoothing model and the linear mixed model. We

then suggest two resulting boo t strap procedures. Before providing simulations, we prop ose

some small sample adjustment to improve the performance o f the bootstrap routine. The

bootstrap is then applied in Section 3 to two data examples making use of additive models.

In Section 4 we employ the bootstrap in testing for nonparametric and semiparametric

models, which shows the applicability of our suggestions in more complicated regression

settings.

2 Penalized Spline Smoothing

2.1 Estimation

We consider the smoot hing model

= µ(x

) + ε

with ε

∼ N(0, σ

) as independent errors. Function µ (x) is assumed to be smooth but

otherwise unspeciﬁed. Following the idea of penalized spline smoothing sketched in Sec-

tion 1, we approximate µ(x) by µ(x

) = C(x

)θ + δ(x

) where C(x

) is a high dimensional

basis chosen in advance. In this form, δ(x) denotes the approximation bias of the spline

basis in C(x). If C(x) is chosen as a suﬃciently ﬂexible basis, δ(x) does not contain

relevant information and will therefore be dropped subsequently. This means we assume

the function µ(x) to be representable by a high dimensional parametric form C(x)θ. It

is convenient to decompose C(x) into a low dimensional part X and a high dimensional

component Z (see Ruppert, Wand, and Carroll, 20 03). For instance X = (1, x, . . . , x

)

can contain a low dimensional polynomial form while Z is a truncated polynomial basis

Z = ((x −τ

)

, . . . , (x −τ

)

), where (x)

= x

for x > 0 and zero otherwise. Following

Ruppert (2002), we choose K la rge but less than the sample size n (or n − p − 1). As a

practical choice, we suggest K = min(n/4, 40). Alternatively, one may use the selection

routine suggested in Ruppert (2002 ) , but to keep the approach simple we ﬁx K with the

above rule of thumb. Once K is chosen, we select the knots τ

to cover the range of x

values using quantiles. This formulation brings us to the parametric model

Y |x, u ∼ N(Xβ + Zu, σ

I) (1)

Bootstrapping for Penalized Spline Regression

Figures

Citations

Practical variable selection for generalized additive models

Resampling Methods for Dependent Data

Economic convergence: Policy implications from a heterogeneous agent model

Simultaneous selection of variables and smoothing parameters in structured additive regression models

The Geometry of Nutrient Space–Based Life-History Trade-Offs: Sex-Specific Effects of Macronutrient Intake on the Trade-Off between Encapsulation Ability and Reproductive Effort in Decorated Crickets

References

An introduction to the bootstrap

A practical guide to splines

Generalized Additive Models.

Spline models for observational data

Generalized Additive Models

Related Papers (5)

Flexible smoothing with B-splines and penalties

Semiparametric Regression: Example Index

Selecting the Number of Knots for Penalized Splines

A Statistical Perspective on Ill-posed Inverse Problems

Theoretical and practical aspects of penalized spline smoothing

Frequently Asked Questions (12)

Q1. What are the contributions in "Bootstrapping for penalized spline regression b" ?

Q2. How can the authors draw v from the empirical distribution function of the fitted values?

Q3. What is the main assumption for penalizing a smooth fit?

Q4. What is the advantage of the mixed model approach?

Q5. How can the mixed model bootstrap be used for smoothing?

Q6. What is the linear unbiased predictor?

Q7. What is the smoothing parameter used for the mixed model bootstrap?

Q8. How is the mixed model bootstrap performed?

Q9. What is the way to show that the bootstrap distribution is consistent?

Q10. What is the effect of bathrooms with special features?

Q11. What is the main assumption for penalized spline smoothing?

Q12. What is the main concern when bootstrapping smooth models?