scispace - formally typeset
Open AccessJournal ArticleDOI

Nonparametric Estimation of Average Treatment Effects under Exogeneity: A Review

Reads0
Chats0
TLDR
In this article, the authors review the state of the art in estimating average treatment effects under various sets of assumptions, including exogeneity, unconfoundedness, or selection on observables.
Abstract
Recently there has been a surge in econometric work focusing on estimating average treatment effects under various sets of assumptions. One strand of this literature has developed methods for estimating average treatment effects for a binary treatment under assumptions variously described as exogeneity, unconfoundedness, or selection on observables. The implication of these assumptions is that systematic (for example, average or distributional) differences in outcomes between treated and control units with the same values for the covariates are attributable to the treatment. Recent analysis has considered estimation and inference for average treatment effects under weaker assumptions than typical of the earlier literature by avoiding distributional and functional-form assump- tions. Various methods of semiparametric estimation have been proposed, including estimating the unknown regression functions, matching, meth- ods using the propensity score such as weighting and blocking, and combinations of these approaches. In this paper I review the state of this literature and discuss some of its unanswered questions, focusing in particular on the practical implementation of these methods, the plausi- bility of this exogeneity assumption in economic applications, the relative performance of the various semiparametric estimators when the key assumptions (unconfoundedness and overlap) are satise ed, alternative estimands such as quantile treatment effects, and alternate methods such as Bayesian inference.

read more

Content maybe subject to copyright    Report

NONPARAMETRIC ESTIMATION OF AVERAGE TREATMENT EFFECTS
UNDER EXOGENEITY: A REVIEW*
Guido W. Imbens
Abstract—Recently there has been a surge in econometric work focusing
on estimating average treatment effects under various sets of assumptions.
One strand of this literature has developed methods for estimating average
treatment effects for a binary treatment under assumptions variously
described as exogeneity, unconfoundedness, or selection on observables.
The implication of these assumptions is that systematic (for example,
average or distributional) differences in outcomes between treated and
control units with the same values for the covariates are attributable to the
treatment. Recent analysis has considered estimation and inference for
average treatment effects under weaker assumptions than typical of the
earlier literature by avoiding distributional and functional-form assump-
tions. Various methods of semiparametric estimation have been proposed,
including estimating the unknown regression functions, matching, meth-
ods using the propensity score such as weighting and blocking, and
combinations of these approaches. In this paper I review the state of this
literature and discuss some of its unanswered questions, focusing in
particular on the practical implementation of these m ethods, the plausi-
bility of this exogeneity assumption in economic applications, the relative
performance of the various semiparametric estimators when the key
assumptions (unconfoundedness and overlap) are satis ed, alternative
estimands such as quantile treatment effects, and alternate methods such
as Bayesian inference.
I. Introduction
S
INCE the work by Ashenfelter (1978), Card and Sulli-
van (1988), Heckman and Robb (1984), Lalonde
(1986), and others, there has been much interest in econo-
metric methods for estimating the effects of active labor
market programs such as job search assistance or classroom
teaching programs. This interest has led to a surge in
theoretical work focusing on estimating average treatment
effects under various sets of assumptions. See for general
surveys of this literature Angrist and Krueger (2000), Heck-
man, LaLonde, and Smith (2000), and Blundell and Costa-
Dias (2002).
One strand of this literature has developed methods for
estimating the average effect of receiving or not receiving a
binary treatment under the assumption that the treatment
satis es some form of e xogeneity. Different versions of this
assumption are referred to as unconfoundedness (Rosen-
baum & Rubin, 1983a), selection on observables (Barnow,
Cain, & G oldberger, 1980; Fitzgerald, Gottschalk, & Mof-
tt, 1998), or conditional independence (Lechner, 1999). In
the remainder of this paper I will use the terms unconfound-
edness and exogeneity interchangeably to denote the as-
sumption that the receipt of t reatment is independent of the
potential outcomes with and without treatment if certain
observable covariates are held constant. The implication of
these assumptions is that systematic ( for exam ple, average
or distributional) differences in outcomes between treated
and control units with the same values for these cova riates
are attributable to the treatment.
Much of the recent work, building on the statistical
literature by Cochran (1968), Cochran and Rubin (1973),
Rubin (1973a, 1973b, 1977, 1978), Rosenbaum and Rubin
(1983a, 1983b, 1984), Holland (1986), and others, considers
estimation and inference without distributional and func-
tional form assum ptions. Hahn (1998) derived ef ciency
bounds assuming only unconfoundedness and some regu-
larity conditions and proposed an ef cient estimator. Vari-
ous alternative estimators have been proposed given these
conditions. These estimation methods can be grouped into
ve categories: (i) methods based on estimating the un-
known regression functions of the outcome on the covari-
ates (Hahn, 1998; Heckman, Ichimur a, & Todd, 1997, 1998;
Imbens, Newey, & Ridder, 2003), (ii) matching on covari-
ates (Rosenbaum, 1995; Abadie and Imbens, 2002) (i ii)
methods based on the propensity score, including blocking
(Rosenbaum & Rubin, 1984) and weighting (Hirano, Im-
bens, & Ridder, 2003), (iv) combinations of these ap-
proaches, for example, weighting and regression (Robins &
Rotnizky, 1995) or matching and regression (Abadie &
Imbens, 2002), and (v) Bayesian methods, which have
found relatively little following since Rubin (1978). In this
paper I will review the state of this literature—with partic-
ular emphasis on implications for empirical work—and
discuss some of the remaining questions.
The organization of the paper is as follows. In section II
I will introduce the notation and the assumptions used f or
identi cation. I will also discuss the difference between
population- and sample-average treatment effects. The re-
cent econom etric literature has largely focused on estima-
tion of the population-average treatm ent effect and its coun-
terpart for the subpopulation of treated units. An alternative,
following the early experimental literature (Fisher, 1925;
Neyman, 1923), is to consider estimation of the average
effect of the treatment for the units in the sample. Many of
the estimators proposed can be interpreted as estimating
either the average treatment effect for the sample at hand, or
the average treatment effect for the population. Although the
choice of estimand may not affect the form of the estimator,
it has implications for the ef ciency bounds and for the
form of estimators of the asympt otic variance; the variance
of estimators for the sample average treatment effect are
Received for publication October 22, 2002. Revision accepted for
publication June 4, 2003.
* University of California at Berkeley and NBER
This paper was presented as an invited lecture at the Australian and
European meetings of the Econometric Society in July and August 2003.
I am also grateful to Joshua Angrist, Jane Herr, Caroline H oxby, Charles
Manski, Xiangyi Meng, Robert Mof tt, and Barbara Sianesi, and two
referees for comments, and to a number of collaborators, Alberto Abadie,
Joshua Angrist, Susan Athey, Gary Chamberlain, Keisuke Hirano, V.
Joseph Hotz, Charles Manski, Oscar Mitnik, Julie Mortimer, Jack Porter,
Whitney Newey, Geert Ridder, Paul Rosenbaum, and Donald Rubin for
many discussions on the topics of this paper. Financial support for this
research was generously provided through NSF grants SBR 9818644 and
SES 0136789 and the Giannini Foundation.
The Review of Economics and Statistics, February 2004, 86(1): 4–29
©
2004 by the President and Fellows of Harvard College and the Massachusetts Institute of Technology

generally smaller. In section II, I will also discuss alterna-
tive estimands. Almost the entire literature has focused on
average effects. However, in many cases such measures
may mask important distributional changes. These can be
captured more easily by focusing on quantiles of the distri-
butions of potential outcomes, in the presence and absence
of the treatment (Lehman, 1974; Docksum, 1974; Firpo,
2003).
In section III, I will discuss in more detail some of the
recently proposed semiparametric estimators for the average
treatment effect, including those based on regression,
matching, and the propensity score. I will focus particularly
on implementation, and compare the different decisions
faced regarding smoothing parameters using the various
estimators.
In section IV, I will discuss estimation of the variances of
these average treatment effect estimators. For most of the
estimators introduced in the recent literature, c orresponding
estimators for the variance have also been proposed, typi-
cally requiring additional nonparametric regression. In prac-
tice, however, researchers often rely on bootstrapping, al-
though this method has not been formally justi ed. In
addition, if one is inter ested in the average treatment effect
for the sample, bootstr apping is clearly inappropriate. Here
I discuss in more detail a simple e stimator for the variance
for matching estimators, developed by Abadie and Imbens
(2002), that does not require additional nonparametric esti-
mation.
Section V discusses different approaches to assessing the
plausibility of the two key assumptions: exogeneity or
unconfoundedness, and overlap in the covariate distribu-
tions. The rst of these assumptions is in principle untest-
able. Nevertheless a number of approaches have been pro-
posed that are useful for addressing its credibility ( Heckman
and Hotz, 1989; Rosenbaum, 1984b). One may also wish to
assess the responsiveness of the results to this assumption
using a sensitivity analysis (Rosenbaum & Rubin, 1983b;
Imbens, 2003), or, in its extreme form, a bounds analysis
(Manski, 1990, 2003). The second assumption is that there
exists appropriate overlap in the covariate distributions of
the treated and control units. Tha t is effectively an assump-
tion on the joint distribution of observable variables. How-
ever, as it only involves inequality restrictions, there are no
direct tests of this null. Nevertheless, in practice it is often
very important to assess whether there is suf cie nt overlap
to draw credible inferences. Lacking overlap for the full
sample, one may wish to limit inferences to the average
effect for the subset of the covariate space where there exists
overlap bet ween the treated and control observations .
In Section VI, I discuss a number of implementations of
average treatment effect estimators. The rst set of imple-
mentations involve comparisons of the nonexperimental
estimators to r esults based on randomized exper iments,
allowing direct tests of the unconfoundedness assumption.
The second set consists of simulation st udiesusing data
created either to ful ll the unconfoundedness assumption or
to fail it a known way—designed to compare the applica-
bility of the various treatment effect estimators in these
diverse settings.
This survey will not addr ess alternatives for e stimating
average treatment effects that do not rely on exogeneity
assumptions. T his includes approaches where selected ob-
served covariates are not adjusted for, such as instrumental
variables analyses (Bjo¨rklund & Mof t, 1987; Heckman &
Robb, 1984; Imbens & Angrist, 1994; Angrist, Imbens, &
Rubin, 1996; Ichimura & Taber, 2000; Abadie, 2003a;
Chernozhukov & Hansen, 2001). I will also not disc uss
methods exploiting the presence of additional data, such as
difference in differences in repeated cross sections (Abadie,
2003b; Blundell et al., 2002; Athey and Imbens, 2002) and
regression discontinuity where the over lap assumption is
violated (van der Klaauw, 2002; Hahn, Todd, & van der
Klaauw, 2000; Angrist & Lavy, 1999; Black, 1999; Lee,
2001; P orter, 2003). I will also limit the discussion to binary
treatments, excluding models with static multivalued treat-
ments as in Imbens ( 2000) and Lechner (2001) and models
with dynamic treatment regim es as in Ham and LaLonde
(1996), Gill and Robins (2001), and Abbring and van den
Berg (2003). Reviews of many of these methods can be
found in Shadish, Campbell, and C ook (2002), Angrist and
Krueger (2000), Heckman, LaLonde, and Smith (2000), and
Blundell and Costa-Dias (2002).
II. Estimands, Identi cation, and Ef ciency Bounds
A. De nitions
In this paper I will use the potential-outcome notation that
dates back to the analysis of randomized experiments by
Fisher (1935) and Neyman (1923). After being forcefully
advocated in a series of papers by Rubin (1974, 1977,
1978), this notation is now standard in the literature on both
experimental and nonexperimental program evaluation.
We begin with N units, indexed by i 5 1, . . . , N,
viewed as draw n randomly from a large population. Each
unit is characterized by a pair of potential outcomes, Y
i
(0)
for the outcome under the control treatment and Y
i
(1) for
the outcome under the active t reatment. In addition, each
unit has a vector of characteristics, referred to as covariates,
pretreatment variables, or exogenous variables, and denoted
by X
i
.
1
It is important that these variables ar e not affected by
the treatment. Often they take their values prior to the unit
being exposed to the treatment, although this is not suf -
cient for the conditions they need to satisfy. Importantly,
this vector of covariates can include lagged outcomes.
1
Calling such variables exogenous is somewhat at odds with several
formal de nitions of exogeneity (e.g., Engle, Hendry, & Richard, 1974),
as knowledge of their distribution can be informative about the average
treatment effects. It does, however, agree with common usage. See for
example, Manski et al. (1992, p. 28). See also Fro¨lich (2002) and Hirano
et al. (2003) for additional discussion.
AVERAGE TREATMENT EFFECTS 5

Finally, each unit is exposed to a single tr eatment; W
i
5 0
if unit i receives the control treatment, and W
i
5 1 if unit
i receives the active treatment. We therefore observe for
each unit the triple (W
i
, Y
i
, X
i
), where Y
i
is the realized
outcome:
Y
i
; Y
i
~W
i
! 5
H
Y
i
~0! if W
i
5 0,
Y
i
~1! if W
i
5 1.
Distributions of (W, Y, X) refer to the distribution induced
by the random sam pling from the superpopulation.
Several a dditional pieces of notation will be useful in the
remainder of the paper. Fir st, the propensity score (Rosen-
baum and Rubin, 1983a) is de ned as the conditional
probability of receiving the treatment,
e~ x! ; Pr~W 5 1uX 5 x! 5 E@WuX 5 x#.
Also, de ne, for w [ {0, 1}, the two conditional regression
and variance functions
m
w
~ x! ; E@Y~w!uX 5 x#, s
w
2
~ x! ; V~Y~w!uX 5 x!.
Finally, let r( x) be the conditiona l correlation coef cient of
Y(0) and Y(1) given X 5 x. As one never observes Y
i
(0)
and Y
i
(1) for the same unit i, the data only contain indirect
and very limited information about this correlation coef -
cient.
2
B. Estimands: Average Treatment Effects
In this discussion I will primarily focus on a number of
average treatment effects (ATEs). This is less limiting than
it may seem, however, as it includes averages of arbitrary
transformations of the original outcomes. Later I will return
brie y to alternative estimands that cannot be written in this
form.
The rst estimand, and the most commonly studied in the
econometric literature, is the population-average treatment
effect (PATE):
t
P
5 E@Y~1! 2 Y~0!#.
Alternatively we may be interested in the population-
average treatment effect for the treated (PATT; for example,
Rubin, 1977; Heckman & Robb, 1984):
t
T
P
5 E@Y~1! 2 Y~0!uW 5 1#.
Heckman and Robb (1984) and Heckman, Ichimura, and
Todd (1997) argue that the subpopulation of treated units is
often of more interest than the overall population in the
context of narrowly targeted programs. For example, if a
program is speci cally directed at individuals disadvan-
taged in the labor market, there is often little interest in the
effect of such a program on individuals with strong labor
market attachment.
I will also look at sample-average versions of these two
population measures. These estimands focus on the average
of the treatment effect in the speci c sample, rather than in
the population at large. They include the sample-average
treatment effect (SATE)
t
S
5
1
N
O
i
51
N
@Y
i
~1! 2 Y
i
~0!#,
and the sample-average treatment effect for the treated
(SATT)
t
T
S
5
1
N
T
O
i
:
W
i
51
@Y
i
~1! 2 Y
i
~0!#,
where N
T
5
¥
i
51
N
W
i
is the number of treated units. The
SATE and the SATT have received little attention in the
recent econometric literature, although the SATE has a long
tradition in the analysis of randomized experiments (for
example, Neyman, 1923). Without further assumptions, the
sample contains no information about the PATE beyond the
SATE. To see this, consider the case where we observe the
sample (Y
i
(0), Y
i
(1), W
i
, X
i
), i 5 1, . . . , N; tha t is, we
observe both pot ential outcomes for each unit. In that case
t
S
5
¥
i
[Y
i
(1) 2 Y
i
(0)]/N can be estimated without error.
Obviously, the best estimator for the population-average
effect t
P
is t
S
. However, we cannot estimate t
P
without
error even with a sample where all potential outcomes are
observed, because we lack the potential outcomes for those
population members not included in the sample. This simple
argument has two implications. First, one can e stimate the
SATE at least as accurately as t he PATE, and typically more
so. In fact, the difference between the two variances is the
variance of the treatment effect, which is zero only when the
treatment effect is constant. Second, a good estimator for
one ATE is a utomatically a good estimator for the other. One
can therefore interpret many of the estimators for PATE or
PATT as estimators for SATE or SATT, with low er implied
standard errors, as discussed in more detail in section IIE.
A third pair of estimands combines f eatures of the other
two. These estimands, introduced by Abadie and Imbens
(2002), focus on the ATE conditional on the sample distri-
bution of the covariates. Formally, the conditional ATE
(CATE) is de ned as
t~X! 5
1
N
O
i
51
N
E@Y
i
~1! 2 Y
i
~0!uX
i
#,
and the SATE for the treated (CATT) is de ned as
t~X!
T
5
1
N
T
O
i
:
W
i
51
E@Y
i
~1! 2 Y
i
~0!uX
i
#.
2
As Heckman, Smith, and Clemens (1997) point out, however, one can
draw some limited inferences about the correlation coef cient from the
shape of the two marginal distributions of Y(0) and Y(1).
THE REVIEW OF ECONOMICS AND STATISTICS6

Using the same argument as in the previous paragraph, it
can be shown that one can estimate CATE and CATT more
accurately than PATE and PATT, but generally less accu-
rately than S ATE and SATT.
The difference in asymptotic variances forces the re-
searcher to take a stance on what the quantity of interest is.
For example, in a speci c application one can legitimately
reach the conclusion that there is no evidence, at the 95%
level, that the PATE is diff erent from zero, whereas there
may be compelling evidence that the SATE and CATE are
positive. Typically researchers in econometrics have fo-
cused on the PATE, but one can argue that it is of interest,
when one cannot ascertain the sign of the population-level
effect, to know whether one can determine the sign of the
effect for the sample. Especially in cases, which are all too
common, where it is not clear whether the sample is repre-
sentative of the population of interest, results for the sample
at hand may be of considerable interest.
C. Identi cation
We make the following ke y assumption about the treat-
ment assignment:
ASSUMPTION 2.1 (UNCONFOUNDEDNESS):
~Y~0!, Y~1!! \ WuX.
This assumption was rst articulated in this form by
Rosenbaum and Rubin (1983a), who refer to it as “ignorable
treatment assignment. Lechner (1999, 2002) refers to this
as the “conditional independence assumption. Following
work by Barnow, Cain, and Goldberger (1980) in a r egres-
sion setting it is also referred to as “selection on observ-
ables.
To see the link with standard exogeneity assumptions,
suppose that the treatment effect is constant: t 5 Y
i
(1) 2
Y
i
(0) for all i. Suppose also that the control outcome is
linear in X
i
:
Y
i
~0! 5 a 1 X9
i
b 1
e
i
,
with
e
i
\ X
i
. Then we can write
Y
i
5 a 1 t
z
W
i
1 X9
i
b 1
e
i
.
Given the assumption of constant treatment effect, uncon-
foundedness is equivalent to independence of W
i
and
e
i
conditional on X
i
, which would also capture the idea that W
i
is exogenous. Without this assumption, however, uncon-
foundedness does not imply a linear relation with (mean-)-
independent errors.
Next, we make a second assumption regarding the joint
distribution of treatments and covariates:
ASSUMPTION 2.2 (OVERLAP) :
0 , Pr~W 5 1uX! , 1.
For many of the formal results one will also ne ed smooth-
ness assumptions on the conditional regression functions
and the propensity score [m
w
( x) and e( x)], and moment
conditions on Y(w). I will not discuss these regularity
conditions here. Details can be found in the references for
the speci c estimators given below.
There has been some controversy about the plausibility of
Assumptions 2.1 and 2.2 in economic settings, and thus
about the relevance of the econometric literature that fo-
cuses on estimation and inference under these conditions for
empirical work. In this debate it has been argued that
agents optimizing behavior pr ecludes their choices being
independent of the potential outcomes, whether or not
conditional on covariates. This seems an unduly narrow
view. In response I will offer three arguments for consider-
ing these assumptions.
The rst is a statistical, data-descriptive motivation. A
natural starting point in the evaluation of any program is a
comparison of average outcomes for treated and control
units. A logical next step is to adjust any difference in
average outcomes for differences in exogenous background
characteristics (exogenous in the sense of not being affected
by the treatment). Such an analysis may not lead to the nal
word on the ef cacy of the treatment, but its absence would
seem dif cult to rationalize in a serious attempt to under-
stand the evidence regarding the effect of the trea tment.
A second argument is that almost any evaluation of a
treatment involves comparisons of units who received the
treatment with units who did not. The question is typically
not whether such a comparison should be made, but r ather
which units should be compared, tha t is, which units best
represent the treated units had they not been treated. Eco-
nomic theory can help in classifying variables into those
that need to be adjusted for versus those that do not, on the
basis of their r ole in the decision process (f or example,
whether they enter the utility function or the constraints).
Given that, the unconfoundedness assumption merely as-
serts that all variables that need to be adjusted for are
observed by the researcher. This is an empirical question,
and not one that should be controversial as a general
principle. It is clear that settings where some of these
covariates are not observed will require strong assumptions
to allow for identi cation. Such assumptions include instru-
mental variables settings where some covariates are as-
sumed to be independent of the potential outcomes. Absent
those assumptions, typically only bounds can be identi ed
(as in Manski, 1990, 2003).
A third, related argument is that even when agents choose
their treatment optimally, two agents with the same values
for obse rved c haract eristics may differ in their treatment
choices without invalidating the unconfoundedness assump-
tion if the difference in their choices is driven by differences
in unobserved characteristics that are themselves unrelated
to the outcomes of interest. The plausibility of this will
depend critically on the exact nature of the optimization
AVERAGE TREATMENT EFFECTS 7

process fa ced by the age nts. In particular it m ay be im por-
tant that the objective of the decision m aker is distinct from
the outcome that is of interest to the evaluator. For example,
suppose we are interested in estimating the average effect of
a binar y input (such as a new technology) on a rm’s
output.
3
Assume production is a stochastic function of this
input bec ause other inputs (such as weather) are not unde r
the rm’s control: Y
i
5 g(W,
e
i
). Suppose t hat pro ts are
output minus costs (p
i
5 Y
i
2 c
i
z
W
i
), and also that a rm
chooses a production level to maximize expected pro ts,
equal to output minus costs, conditional on the cost of
adopting new technology,
W
i
5 arg max
w[
$0,1 %
E@p~w!uc
i
#
5 arg m ax
w[
$0,1%
E@ g~w,
e
i
! 2 c
i
z
wuc
i
#,
implying
W
i
5 1$E@ g~1,
e
! 2 g~0,
e
i
! $ c
i
uc
i
#% 5 h~c
i
!.
If unobserve d marginal costs c
i
differ between rms, and
these marginal costs are independent of the errors
e
i
in the
rms’ forecast of production given inputs, then unconfound-
edness will hold, as
~ g~0,
e
!, g~1,
e
i
!! \ c
i
.
Note that under the same assumptions one cannot necessar-
ily identify the effect of the input on pro ts, for (p
i
(0),
p(1)) are not independent of c
i
. For a related discussion, in
the context of instrumental variables, see Athey and Stern
(1998). Heckman, LaLonde, and Smith (2000) discuss al-
ternative models that justify unconfoundedness. In these
models individuals do attempt to optimize the same out-
come that is the variable of interest to the evaluator. They
show that selection-on-observable s assumptions can be jus-
ti ed by imposing restrictions on the way individuals form
their expectations about the unknown potential outcomes. In
general, therefore, a researcher may wish to consider, either
as a nal analysis or as part of a larger investigation,
estimates based on the unconfoundedness assumption.
Given the two key assumptions, unconfoundedness and
overlap, one can identify the average treatment effects. The
key insight is that given unconfoundedness, the following
equalities hold:
m
w
~ x! 5 E@Y~w!uX 5 x# 5 E@Y~w!uW 5 w, X 5 x#
5 E@YuW 5 w, X 5 x#,
and thus m
w
( x) is identi ed. Thus one can estimate the
average treatment effect t by rst estimating the average
treatment effect for a subpopulation with covariates X 5 x:
t~ x! ; E@Y~1! 2 Y~0!uX 5 x# 5 E @Y~1!uX 5 x#
2 E@Y~0!uX 5 x# 5 E@Y~1!uX 5 x, W 5 1#
2 E@Y~0!uX 5 x, W 5 0# 5 E@YuX, W 5 1#
2 E@YuX, W 5 0#;
followed by averaging over the appropriate distribution of x.
To make this feasible, one needs to be able to estimate the
expectations E[YuX 5 x, W 5 w] for all values of w and
x in the support of these var iables. This is where the second
assumption enters. If the over lap assumption is violated at
X 5 x, it would be infeasible to estimate both E[YuX 5 x,
W 5 1] and E[YuX 5 x, W 5 0], because at those values
of x there would be either only treated or only control units.
Some researchers use weaker versions of the uncon-
foundedness assumption (for example, Heckman, Ichimura,
and Todd, 1998). If the interest is in the PATE, it is suf cient
to assume that
ASSUMPTION 2.3 (MEAN INDEPENDENCE):
E@Y~w!uW, X# 5 E@Y~w!uX#,
for w 5 0, 1.
Although this assumpt ion is unquestionably weaker, in
practice it is rare that a convincing case is made for the
weaker assumption 2.3 without the case being equally
strong for the stronger version 2.1. The reason is that the
weaker assumption is intrinsically tied to functiona l-form
assumptions, and as a result one cannot identify average
effects on transformations of the original outcome (such as
logarithms) without the stronger assumption.
One can weaken the unconfoundedness assumption in a
different direction if one is only interested in the average
effect for the treated (see, for example, Heckman, Ichimura,
& Todd, 1997). In that case one need only assume
ASSUMPTION 2.4 (UNCONFOUNDEDNESS FOR CONTROLS):
Y~0! \ WuX.
and the weaker overlap assumption
ASSUMPTION 2.5 (WEAK OVERLAP):
Pr~W 5 1uX! , 1.
These two assumptions are suf cient for identi cation of
PATT and SAT T, because the moments of the distribution of
Y(1) for the treated are directly estimable.
An important result building on the unconfoundedness
assumption show s that one need not condition simulta-
3
If we are interested in the average effect for rms that did adopt the
new technology (PATT), the following assumptions can be weakened
slightly.
THE REVIEW OF ECONOMICS AND STATISTICS8

Citations
More filters
Journal ArticleDOI

An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies

TL;DR: The propensity score is a balancing score: conditional on the propensity score, the distribution of observed baseline covariates will be similar between treated and untreated subjects, and different causal average treatment effects and their relationship with propensity score analyses are described.
Journal ArticleDOI

Some practical guidance for the implementation of propensity score matching

TL;DR: Propensity score matching (PSM) has become a popular approach to estimate causal treatment effects as discussed by the authors, but empirical examples can be found in very diverse fields of study, and each implementation step involves a lot of decisions and different approaches can be thought of.
Journal ArticleDOI

Matching Methods for Causal Inference: A Review and a Look Forward

TL;DR: A structure for thinking about matching methods and guidance on their use is provided, coalescing the existing research (both old and new) and providing a summary of where the literature on matching methods is now and where it should be headed.
Journal ArticleDOI

Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference

TL;DR: A unified approach is proposed that makes it possible for researchers to preprocess data with matching and then to apply the best parametric techniques they would have used anyway and this procedure makes parametric models produce more accurate and considerably less model-dependent causal inferences.
Journal ArticleDOI

Recent developments in the econometrics of program evaluation

TL;DR: In the last two decades, much research has been done on the econometric and statistical analysis of such causal effects as discussed by the authors, which has reached a level of maturity that makes it an important tool in many areas of empirical research in economics, including labor economics, public finance, development economics, industrial organization, and other areas in empirical microeconomics.
References
More filters
Book

An introduction to the bootstrap

TL;DR: This article presents bootstrap methods for estimation, using simple arguments, with Minitab macros for implementing these methods, as well as some examples of how these methods could be used for estimation purposes.
Journal ArticleDOI

The central role of the propensity score in observational studies for causal effects

Paul R. Rosenbaum, +1 more
- 01 Apr 1983 - 
TL;DR: The authors discusses the central role of propensity scores and balancing scores in the analysis of observational studies and shows that adjustment for the scalar propensity score is sufficient to remove bias due to all observed covariates.
Book

Experimental and Quasi-Experimental Designs for Generalized Causal Inference

TL;DR: In this article, the authors present experiments and generalized Causal inference methods for single and multiple studies, using both control groups and pretest observations on the outcome of the experiment, and a critical assessment of their assumptions.
Journal ArticleDOI

Estimating causal effects of treatments in randomized and nonrandomized studies.

TL;DR: A discussion of matching, randomization, random sampling, and other methods of controlling extraneous variation is presented in this paper, where the objective is to specify the benefits of randomization in estimating causal effects of treatments.
Journal ArticleDOI

Constructing a Control Group Using Multivariate Matched Sampling Methods That Incorporate the Propensity Score

TL;DR: This article used multivariate matching methods in an observational study of the effects of prenatal exposure to barbiturates on subsequent psychological development, using the propensity score as a distinct matching variable.
Related Papers (5)
Frequently Asked Questions (8)
Q1. What are the contributions mentioned in the paper "Nonparametric estimation of average treatment effects under exogeneity: a review*" ?

In this paper I review the state of this literature and discuss some of its unanswered questions, focusing in particular on the practical implementation of these methods, the plausibility of this exogeneity assumption in economic applications, the relative performance of the various semiparametric estimators when the key assumptions ( unconfoundedness and overlap ) are satisŽ ed, alternative estimands such as quantile treatment effects, and alternate methods such as Bayesian inference. 

A key feature that casts doubt on the relevance of the asymptotic distributions is that the =N consistency is obtained by averaging a nonparametric estimator of a regression function that itself has a slow nonparametric convergence rate over the empirical distribution of its argument. 

He then used a number of methods—ranging from a simple difference, to least squares adjustment, a Heckman selection correction, and differenceindifferences techniques—to create nonexperimental estimates of the average treatment effect. 

In practice one may wish to undersmooth the estimation of the propensity score, either by choosing a bandwidth smaller than optimal for nonparametric estimation or by including higher-order terms in a series expansion. 

The third method is to estimate the same regression function for the controls, but using only those that are used as matches for the treated units, with weights corresponding to the number of times a control observations is used as a match (see Abadie and Imbens, 2002). 

[If unit i is a control unit, the correction will be done using an estimator for the regression function m1( x) based on a linear speci cation Y i 5 a1 1 b91X i estimated on the treated units.] 

In practice the relative merits of these estimators will depend on whether the propensity score is more or less smooth than the regression functions, and on whether additional information is available about either the propensity score or the regression functions. 

the authors make a second assumption regarding the joint distribution of treatments and covariates:ASSUMPTION 2.2 (OVERLAP):0 , Pr~W 5 1uX! , 1.For many of the formal results one will also need smoothness assumptions on the conditional regression functions and the propensity score [mw( x) and e( x)], and moment conditions on Y(w).