What are the contributions in "Wild bootstrap tests for iv regression" ?

The authors propose a wild bootstrap procedure for linear regression models estimated by instrumental variables. Like other bootstrap procedures that the authors have proposed elsewhere, it uses efficient estimates of the reduced-form equation ( s ). The authors provide simulation evidence that it works far better than older methods, such as the pairs bootstrap. The authors also show how to obtain reliable confidence intervals by inverting bootstrap tests.

Why does the pairs bootstrap test not impose the null hypothesis?

Because the pairs bootstrap DGP does not impose the null hypothesis, the bootstrap t statistics must be computed ast(β̂∗j , β̂) = β̂∗j − β̂ se(β̂∗j ) .

Why is it desirable to impose a null hypothesis on the bootstrap DGP?

This is because imposing a (true) restriction makes estimation more efficient, and using more efficient estimates in the bootstrap DGP should reduce the error in rejection probability (ERP) associated with the bootstrap test.

What is the importance of the DGP used to generate the bootstrap samples?

The choice of the DGP used to generate the bootstrap samples is critical, and it can dramatically affect the properties of bootstrap tests.

What are the parameters that influence the performance of the tests?

In the context of the DGP given by (27) and (28), there are only four parameters that influence the finite-sample performance of the tests, whether asymptotic or bootstrap.

What is the smallest measure of how strong the instruments are?

One of several possible measures of how strong the instruments are is the concentration parameter, which can be written asa2 ≡ 1 σ22 π⊤W⊤MZWπ. (18)– 6 –Evidently, the concentration parameter is large when the ratio of the error variance in the reduced-form equation to the variance explained by the part of the instruments that is orthogonal to the exogenous variables in the structural equation is small.

how to construct confidence intervals by inverting bootstrap tests?

In addition, the authors discuss how to construct confidence intervals by inverting bootstrap tests based on bootstrap DGPs that impose the null hypothesis, such as the RE and WRE bootstraps.

How can the authors expect a bootstrap confidence interval to work?

It can be expected to work well whenever the rejection frequencies for tests at level α based on the relevant bootstrap method are in fact close to α.1.

Why do the authors believe that symmetric bootstrap tests are more attractive in the context of t?

The authors believe that these are more attractive in the context of t statistics than tests based on the symmetric P value (7), because IV estimates can be severely biased when the instruments are weak.

What is the striking result in Figure 6?

The most striking result in Figure 6 is that using RE, the bootstrap method which does not allow for heteroskedasticity, along with any of the test statistics that require homoskedasticity (ts, AR, and K) often leads to severe overrejection.

What is the rescaling of the REC bootstrap?

The biascorrected estimator can be used in a modified version of the RE bootstrap, called the REC bootstrap by Davidson and MacKinnon.

What is the way to proceed with the test statistics?

For test statistics that are always positive, such as the AR and K statistics that will be discussed in the next section, the authors can use (7) without taking absolute values, and this is really the only sensible way to proceed.

What is the difference between the equal-tail tests and the symmetric ones?

the equal-tail tests seem to perform better than the symmetric ones, and they are less sensitive to the values of ρ, which further justifies their choice to focus on them.

What is the way to construct a bootstrap confidence interval?

The authors now explain precisely how to construct such an interval with nominal coverage 1− α.– 19 –The method the authors propose can be used with any bootstrap DGP that imposes the null hypothesis, including the RE and WRE bootstraps.

(Open Access) Wild Bootstrap Tests for IV Regression (2010) | Russell Davidson

Q: What have the authors stated for future works in "Wild bootstrap tests for iv regression" ?

The authors also show how to apply the RE and WRE bootstraps to models with two or more endogenous variables on the right-hand side, but their performance in this context remains a topic for future research.



Queen’s Economics Department Working Paper No. 1135 (Revised)

Wild Bootstrap Tests for IV Regression

Russell Davidson

McGill University

James G. MacKinnon

Queen’s University

Department of Economics

Queen’s University

94 University Avenue

Kingston, Ontario, Canada

K7L 3N6

Revised 3-2008

Wild Bootstrap Tests for IV Regression

Russell Davidson

GREQAM

Centre de la Vieille Charit´e

2 rue de la Charit´e

13236 Marseille cedex 02, France

Department of Economics

McGill University

Montreal, Quebec, Canada

H3A 2T7

email: Russell.Davi dson@mcgill .ca

and

James G. MacKinnon

Department of Economics

Queen’s University

Kingston, Ontario, Canada

K7L 3N6

email: jgm@econ.queensu.ca

Abstract

We propose a wild bootstrap procedure for linear regression models estimated by

instrumental variables. Like other bootstrap procedures that we have proposed else-

where, it uses eﬃcient estimates of the reduced-form equation(s). Unlike them, it takes

account of possible heteroskedasticity of unknown form. We apply this procedure t o

t tests, including heteroskedasticity-robust t tests, and to the Anderson-Rubin test. We

provide simulation evidence that it works far better than older methods, such as the

pairs bootstrap. We also show how to obtain reliable conﬁdence intervals by inverting

bootstrap tests. An empirical example illustrates the utility of these procedures.

Keywords: Instrumental variables estimation, two-stage least squares, weak instru-

ments, wild bootstrap, pairs bootstrap, residual bootstrap, conﬁdence intervals,

Anderson-Rubin test

JEL codes: C12, C15, C30

This research was supported, in part, by grants from the Soc ial Sciences and Hum anities

Research Council of Canada, the Canada Research Chairs program (Chair in Econom ics,

McGill University), and the Fonds Qu´eb´ecois de Recherche sur la Soci´et´e et la Cult ure. We

are grateful to Arthur Sweetman for a valuable suggestion and to two referees and an associate

editor for very helpful comments.

Revised, March 2008

Minor corr ections, May 2009, November 2011, and November 2013

1. Introduction

It is often diﬃcult to make reliable inferences from regressions estimated using instru-

mental va riables. This is especially true when the instruments are weak. There is

an enormous l iterature on thi s subject, much of it quite recent. Most of the papers

focus on the case in which there is just one endogenous vari able on the right-hand

side of the regression, and the problem is to test a hypothesis about the coeﬃcient of

that variabl e. In this paper, we al so focus on t hi s case, but, in addition, we discuss

conﬁdence intervals, and we al low the number of endogenous variables to exceed two.

One way to obtain reliable inferences is to use statistics with better properties than

those of the usual IV t statistic. These include the famous Anderson-Rubin, or AR,

statistic proposed i n Anderson and Rubin (1949) and extended in Dufour and Taamouti

(2005, 2007), t he Lagrange Multiplier, or K, statistic proposed in Kleibergen ( 2002),

and the conditional likelihood ratio, or CLR, test proposed in Moreira (2003). A

detailed analysis of several tests is found in Andrews, Morei r a, and Stock ( 2006).

A second way to obta in reliable inferences is to use the bootstrap. This approach

has been much less popular, probably because the simplest bootstrap methods for thi s

problem do not work very well. See, for ex ample, Flores-Lagunes (2007). H owever, the

more sophisti cated bootstrap methods recently proposed in Davidson and MacKinnon

(2008) work very much better than traditional bootstrap procedures, even when they

are combined with the usual t statistic.

One advantage of the t statistic over the AR, K, and CLR statistics is that it can

easily be modiﬁed to be asympt otically valid in the presence of heteroskedasticity of

unknown form. But existing procedures for bootstrapping IV t statistics either are

not valid in this case or work badly in general. The main contribution of this paper

is to propose a new bootstrap data generating process (DGP) which is valid under

heteroskedasticity of unknown form and works well in ﬁnite samples even when the

instruments are quite weak. This is a wild bootstrap version of one of the methods

proposed in Davidson and MacKinnon (2008). Using this bootstra p method together

with a heteroskedasticity-robust t statistic generally seems to work remarka bly well,

even though it is not asymptotically valid under weak instrument asymptot ics. The

method can also be used with other test sta tistics that are not heteroskedasticity-

robust. It seems to work particularly well when used with the AR stati st ic, probably

because the resulting test is asymptotically valid under weak instrument a symptotics.

In the next section, we discuss six bootstrap methods that can be applied to test

statistics for the coeﬃcient of the single right-hand si de endogenous variable in a linear

regression model estimated by IV. Three of these have been available for some time,

two were proposed in Davidson and MacKinnon (2008), and one is a new procedure

based on the wild boo tstrap. In Section 3, we discuss the asymptotic validity of

several tests based on this new wild bootstrap method. In Section 4, we investigate

the ﬁnite-sample performance of the new bootstrap method and some existing ones by

simulation. Our simulation results are quite ex tensive and a re presented graphically.

– 1 –

In Section 5, we brieﬂy discuss the more general case in which there are two or more

endogenous variables on the right-hand side. In Section 6, we discuss how to obtain

conﬁdence intervals by inverting bootst rap tests. Finally, in Section 7, we present an

empirical application that involves estimation of the return to schooling.

2. Bootstrap Methods for IV Regression

In most of this paper, we deal with the two-equation model

= βy

+ Zγ + u

(1)

= Wπ + u

. (2)

Here y

and y

are n--vectors of observations on endogenous varia bl es, Z is an n × k

matrix of observations on exogenous varia bl es, and W is a n n ×l mat rix of exogenous

instruments with the property t hat S(Z), the subspace spanned by the columns of Z,

lies in S(W ), the subspace spanned by the columns of W. E quation (1) is a structural

equation, and equatio n (2) is a reduced-form equation. Observations are indexed by i,

so that, for example, y

denotes the i

element of y

We assume that l > k. This means that the model is either just identiﬁed or over-

identiﬁed. The disturbances are assumed to be serially uncorrelated. When they are

homoskedastic, they have a contemporaneous covariance matrix

Σ ≡



ρσ



However, we will often allow them to be heteroskedastic with unknown (but bounded)

variances σ

and σ

and correlation coeﬃcient ρ

that may depend on W

, the row

vector of instrument al variables for observation i.

The usual t statistic for β = β

can be written as

(

β, β

) =

β − β

ˆσ

||P

− P

−1

, (3)

where

β is the g eneralized IV, or 2SLS, estimate of β, P

and P

are the matrices

that project orthogo nally on to the subspaces S(W ) and S(Z), respectively, and || · ||

denotes the Euclidean length of a vector. I n equation (3),

ˆσ



−

⊤



1/2



−

βy

− Z

γ)

⊤

−

βy

− Z

γ)



1/2

(4)

is the usual 2SLS estimate of σ

. Here

γ denotes the I V estimate of γ, and

is the

usual vector of IV residuals. Many regression packages divide by n − k − 1 instead of

by n. Since ˆσ

as deﬁned in (4) is not necessarily biased downwards, we do not do so.

– 2 –

When homoskedasticity is not assumed, the usual t st atistic (3) should be replaced by

the heteroskedasticity-robust t statistic

(

β, β

) =

β − β

(

β)

, (5)

where

(

β) ≡



i=1

ˆu

− P

)



1/2



− P



. (6)

Here ( P

−P

)

denotes the i

element of the vector P

−P

. Expression

(6) is what most regression packages routinely print as a heteroskedasticity-consistent

standard error for

β. It is evidently the square root of a sandwich variance estimate.

The basic idea o f bootstrap testing is to compare the observed value of some test

statistic, say ˆτ , wit h the empirical distribution of a number o f bootstrap test statistics,

say τ

∗

, for j = 1, . . . , B, where B is the number of bootstrap replications. The

bootstrap statistics are generated using the bootstra p DGP, which must satisfy the

null hypothesis tested by the bootstrap statistics. When α is the level of the test, it is

desirable that α(B + 1) should be an integer, and a commonly used value of B is 999.

See Davidson and MacKinnon ( 2000) for more on how to cho o se B appropriately. If

we are prepared to assume that τ is symmetrically distributed around the origin, then

it i s reasonable to use the sy mmetric bootstrap P value

ˆp

∗

(ˆτ) =

j=1



|τ

∗

| > |ˆτ|



. (7)

We reject the null hypothesis whenever ˆp

∗

(ˆτ ) < α.

For test statistics that a r e always positive, such as the AR and K statistics that will be

discussed in the next section, we can use (7) without taking absolute values, and this

is really the only sensible way to proceed. In the case of IV t statistics, however, the

probability of rejecting in one direction can be very much greater than the probability

of rejecting in the other, because

β is often biased. In such cases, we can use the

equal-tail bootstrap P value

ˆp

∗

(ˆτ) = 2 min



j=1

I(τ

∗

≤ ˆτ),

j=1

I(τ

∗

> ˆτ)



. (8)

Here we actually perform two tests, one against val ues in the lower tail of the distr ibu-

tion and the other against values in the upper tail, and reject if either of them yields

a bootstrap P value less than α /2.

Bootstrap testing can be expected to work well when the quantity bootstrapped is

approximately pivotal, that is, w hen its distribution changes little as the DGP varies

– 3 –

Wild Bootstrap Tests for IV Regression

Figures

Citations

Fast and wild: Bootstrap inference in Stata using boottest:

Assessing the Incidence and Efficiency of a Prominent Place Based Policy

Assessing the Incidence and Efficiency of a Prominent Place Based Policy

The Returns to College Admission for Academically Marginal Students

Are Some Degrees Worth More than Others? Evidence from college admission cutoffs in Chile

References

Numerical recipes in C

Numerical Recipes in C: The Art of Scientific Computing

Bootstrap Methods and Their Application

Instrumental variables regression with weak instruments

Numerical Recipes 3rd Edition: The Art of Scientific Computing

Related Papers (5)

Bootstrap-Based Improvements for Inference with Clustered Errors

Jackknife, Bootstrap and Other Resampling Methods in Regression Analysis

Instrumental variables regression with weak instruments

How Much Should We Trust Differences-In-Differences Estimates?

A Practitioner’s Guide to Cluster-Robust Inference

Frequently Asked Questions (15)

Q1. What are the contributions in "Wild bootstrap tests for iv regression" ?

Q2. What have the authors stated for future works in "Wild bootstrap tests for iv regression" ?

Q3. Why does the pairs bootstrap test not impose the null hypothesis?

Q4. Why is it desirable to impose a null hypothesis on the bootstrap DGP?

Q5. What is the importance of the DGP used to generate the bootstrap samples?

Q6. What are the parameters that influence the performance of the tests?

Q7. What is the smallest measure of how strong the instruments are?

Q8. how to construct confidence intervals by inverting bootstrap tests?

Q9. How can the authors expect a bootstrap confidence interval to work?

Q10. Why do the authors believe that symmetric bootstrap tests are more attractive in the context of t?

Q11. What is the striking result in Figure 6?

Q12. What is the rescaling of the REC bootstrap?

Q13. What is the way to proceed with the test statistics?

Q14. What is the difference between the equal-tail tests and the symmetric ones?

Q15. What is the way to construct a bootstrap confidence interval?