Partial Identification in Triangular Systems of Equations With Binary Dependent Variables

doi:10.3982/ECTA9082

Partial Identiﬁcation in Triangular Systems of Equations with

Binary Dependent Variables

∗

Azeem M. Shaikh

Department of Economics

University of Chicago

amshaikh@uchicago.edu

Edward J. Vytlacil

Department of Economics

Yale University

edward.vytlacil@yale.edu

July 15, 2010

Abstract

This paper studies models for binary outcome variables that contain a binary endogenous

regressor. More speciﬁcally, we consider a nonparametric, triangular system of equations with

binary dependent variables. The main assumption we impose is a weak separability condition on

each equation, or, equivalently, a threshold crossing model on each equation. In this setting, we

construct upper and lower bounds on the Average Structural Function (ASF) and the Average

Treatment Eﬀect (ATE) under weak regularity conditions. The resulting bounds are narrower

the greater the strength of the instrument and the greater the degree to which the exogenous

covariates that enter the outcome equation can compensate for variation in the endogenous

regressor. We show further that the bounds on the ASF and ATE are sharp under an additional

restriction on the support of the covariates and the instrument.

JEL Codes: C14, C35

KEYWORDS: Partial Identiﬁcation, Simultaneous Equation Model, Binary Dependent Vari-

able, Endogeneity, Threshold Crossing Model, Weak Separability, Average Structural Function,

Average Treatment Eﬀect

ACKNOWLEDGEMENTS: We would like to thank Hide Ichimura, Jim Heckman, Whitney

Newey, and Jim Powell for very helpful comments on this paper. This research was conducted

in part while Edward Vytlacil was in residence at Hitotsubashi University. This research was

supported by NSF SES-05-51089 and DMS-0820310.

∗

An earlier version of this paper titled “Threshold Crossing Models and Bounds on Treatment Eﬀects: A Non-

parametric Analysis” appeared in May 2005 as NBER Technical Working Paper 307.

1

1 Introduction

This paper studies models for binary outcome variables that contain a binary endogenous regres-

sor. More speciﬁcally, we consider a nonparametric, triangular system of equations with binary

dependent variables. The main assumption we impose is a weak separability condition on each

equation, or, equivalently, a threshold crossing model on each equation. This structure nests the

bivariate probit model with structural shift of Heckman (1978) as a special case. In this setting,

we consider the problem of partially identifying the Average Structural Function (ASF) and the

Average Treatment Eﬀect (ATE), thereby extending the identiﬁcation results of Vytlacil and Yildiz

(2007).

In order to deﬁne this structure precisely, let D denote the binary endogenous regressor and

let Y denote the outcome of interest. For example, D might denote receipt of job training and Y

later employment, or D might denote receipt of a medical intervention and Y later mortality. See

Bhattacharya et al. (2009) for an application of the methodology developed in this paper to the

evaluation of the impact of Swan-Ganz catheterization on patient mortality. Consider the following

triangular system of equations:

Y = g

1

(D, X, 

1

)

D = g

2

(Z, 

2

) .

(1)

Here, X and Z are observed random vectors that may share elements in common, and 

1

and 

2

are unobserved random variables. Following Blundell and Powell (2004), our object of interest is

the Average Structural Function (ASF)

G

1

(d, x) =

Z

g

1

(d, x, 

1

)dF



1

,

where (d, x) denotes a potential realization of the random vector (D, X). The ASF averages against

the unconditional distribution of 

1

, not the distribution of 

1

conditional on the possibly endoge-

nous regressor D, and thus gives the expected outcome of Y if D were determined exogenously.

We also consider

∆G

1

(x) = G

1

(1, x) − G

1

(0, x) ,

which is often referred to as the Average Treatment Eﬀect (ATE) in the treatment eﬀect literature.

The main assumption we impose is that g

1

and g

2

both satisfy weak separability of the observed

regressors from the unobserved error term. As will be further discussed in Section 2, for a binary

dependent variable, such an assumption is equivalent to assuming that the function is weakly

increasing in the error term, as in Chesher (2005), assuming the monotonicity restriction considered

by Imbens and Angrist (1994), or assuming that the model can be represented as a threshold crossing

model with an additively separable latent error, as in Heckman and Vytlacil (2005). For ease of

2

analysis, we will work with the threshold crossing representation of the model, i.e.,

Y = I{ν

1

(D, X) ≥ 

1

}

D = I{ν

2

(Z) ≥ 

2

} .

(2)

If one assumes that ν

1

and ν

2

are linear functions and that (

1

, 

2

) has a bivariate normal distribu-

tion, then the above model reduces to the classical bivariate probit with structural shift considered

in Heckman (1978). We will not impose any such parametric functional form or parametric distri-

butional assumptions in this paper.

In addition to the weak separability assumption described above, we will require some mild

regularity of the distribution of (

1

, 

2

). We will also assume that X and Z are exogenous in the

sense that (X, Z) ⊥⊥ (

1

, 

2

). Note that D may still be endogenous in the Y equation due to

possible dependence between 

1

and 

2

. For example, those who receive the job training might

have the worst human capital, or those who receive the medical intervention might have the worst

latent health. The resulting bounds on the ASF or ATE are substantially narrower than alternative

bounds that do not impose our weak separability restrictions. Under certain restrictions on the

distribution of (X, Z) and the functions ν

1

and ν

2

in (2), we show further that the bounds we derive

on the ASF and ATE are sharp in the sense that for any value lying between the upper and lower

bounds, there will exist a distribution of unobservable variables satisfying all of the assumptions

of our analysis that is consistent with both the distribution of the observed data and the proposed

value of the ASF or the ATE.

Identiﬁcation of the ASF and ATE with this structure was previously considered by Vytlacil

and Yildiz (2007). They show that when the support of the distribution of X conditional on

Pr{D = 1|Z} is suﬃciently rich it is possible to point identify the ASF and the ATE. Their support

condition will fail if, for example, X is a discrete random variable, and would be expected to fail near

the boundaries of the support of X if X has bounded support. In this paper, we investigate what

can be inferred about the ASF or the ATE without imposing this support restriction. To this end,

we ﬁrst use a modiﬁed instrumental variable-like procedure to determine what variation in X over-

compensates or under-compensates for ceteris paribus variation in D, and then use this information

to construct bounds on the ASF or the ATE. The resulting bounds are smaller the greater the

variation there is in X conditional on Pr{D = 1|Z}, and collapse to point identiﬁcation under the

Vytlacil and Yildiz (2007) condition of suﬃcient variation in X conditional on Pr{D = 1|Z}.

As mentioned earlier, our weak separability restriction on the functions g

1

and g

2

is equivalent to

imposing that the functions are weakly increasing in the error terms 

1

and 

2

, respectively. We do

not impose the stronger requirement that either function is strictly increasing in its error term, as

to do so would imply under our regularity conditions on the distribution of (

1

, 

2

) that Y or D must

be continuous. For this reason, we cannot follow the control variate-approach used, e.g., in Altonji

3

and Matzkin (2005), Blundell and Powell (2004), Chesher (2003), and Imbens and Newey (2010),

which would require g

2

to be strictly increasing in 

2

. Similarly, we cannot follow the quantile

instrumental variable-approach used in Chernozhukov and Hansen (2005) and Chernozhukov et al.

(2007), which would require g

1

to be strictly increasing in 

1

.

Our analysis is similar to Chesher (2005), who only assumes that g

1

and g

2

are weakly increasing

in 

1

and 

2

, respectively. In his analysis, the object of interest is g

1

itself, while in this paper we

focus more modestly on the ASF and the ATE. More importantly, his analysis requires a rank

condition that cannot hold except in trivial cases when D is binary. When D is binary, the rank

condition under which he constructs bounds for g

1

(0, x, τ) is that there exists some value z

0

such

that Pr{D = 1|Z = z

0

} ≤ τ ≤ 0 and for g

1

(1, x, τ) that there exists some value z

0

such that

1 ≤ τ ≤ Pr{D = 1|Z = z

0

}. These conditions cannot hold for any value of τ except τ = 0 or τ = 1,

in which case the ASF is identiﬁed following arguments in Heckman and Vytlacil (2001). See Jun

et al. (2009) for extensions of his analysis and Chesher (2007) for related analysis that considers

partial identiﬁcation of g

1

without imposing any restrictions on g

2

.

The analysis of this paper has recently been extended in subsequent work by Chiburis (2009).

While we show that our bounds are sharp whenever the support of (X, Z) may be written as the

product of the support of X and the support of Z, Chiburis (2009) shows that our bounds may

not be sharp without this restriction. On the other hand, he presents numerical evidence that

suggests that our bounds will often be close to the sharp bounds even when this restriction fails.

Moreover, our bounds are much simpler to describe than the sharp bounds derived in Chiburis

(2009). Chiburis (2009) also considers restrictions beyond what we impose, such as linear latent

index restrictions and parametric distributional assumptions.

The remainder of the paper is organized as follows. In Section 2, we formally deﬁne our

assumptions and analyze the connection between our assumptions and the assumptions considered

in the previous literature. Our main results are contained in Section 3. We conclude with a

numerical example in Section 4.

2 Model and Assumptions

In addition to assuming that Y and D are determined by (2), we will make use of the following

assumptions in our analysis:

Assumption 2.1 (X, Z) ⊥⊥ (

1

, 

2

).

Assumption 2.2 The distribution of (

1

, 

2

) has strictly positive density w.r.t. Lebesgue measure

on R

2

.

4

Assumption 2.3 The support of the distribution of (X, Z), supp(X, Z), is compact.

Assumption 2.4 The functions ν

1

(·), and ν

2

(·) are continuous.

Assumption 2.5 The distribution of ν

2

(Z)|X is nondegenerate.

In the derivation of our bounds, we will exploit the assumption that Y and D are determined by

(2) and Assumptions 2.1 - 2.2. Formally, our analysis will not require Assumption 2.5, but when it

fails our bounds will reduce to those of Manski (1989), who imposes no structure on the equations

determining Y and D. In this sense, though formally our results will not require a variable in Z

that is not in X, they will be nontrivial only when there is a variable in Z that is not contained in

X. When this is the case, any regressor in X that is not in Z will provide an additional source of

identifying power in our analysis. We will make use of Assumptions 2.3 and 2.4 only when arguing

that the bounds are sharp.

As discussed in Vytlacil (2006), the existence of a threshold crossing representation with an

additive latent error as in (2) is equivalent to several other nonparametric monotonicity conditions

considered in the literature. In fact, by combining results from the previous literature, we have the

following lemma:

Lemma 2.1 For f : W × E 7→ {0, 1}, where W ⊆ R

K

W

, E ⊆ R

K

E

, the following statements are

equivalent:

(i) For any w, ˜w ∈ W, f (w, e

∗

) > f( ˜w, e

∗

) for some e

∗

∈ E ⇒ f(w, e) ≥ f( ˜w, e) for all e ∈ E.

(ii) There exists a function ν : E 7→ R with range R(ν) and a function g : W × R(ν) 7→ R with

g(w, t) weakly increasing in t such that f(w, e) = g(w, ν(e)) for all (w, e) ∈ W × E.

(iii) There exists a function ν : W 7→ R with range R(ν) and a function g : R(ν) × E 7→ R with

g(t, e) weakly increasing in t such that f(w, e) = g(ν(w), e) for all (w, e) ∈ W × E.

(iv) There exists a function ν : W 7→ R and a function λ : E 7→ R such that f (w, e) = I{ν(w) ≥

λ(e)} for all (w, e) ∈ W × E.

Proof: The equivalence between (i) and (iii) follows from Theorem C.1 of Vytlacil and Yildiz

(2007). The equivalences between (i) and (iv) and between (ii) and (iv) follow from straightforward

modiﬁcations to the proof of Theorem 1 of Vytlacil (2002).

Restriction (i) in Lemma 2.1 is imposed on the model for D by Imbens and Angrist (1994).

Imbens and Angrist (1994) refer to this restriction as “monotonicity,” whereas Heckman and Vyt-

lacil (2005) refer to it as a uniformity condition. This restriction on D without the corresponding

5

Partial Identification in Triangular Systems of Equations With Binary Dependent Variables

Figures

Citations

Evolution and Rationality Some Recent Game-Theoretic Results. Identification and Estimation of Local Average Treatment Effects

Identifying the Effects of SNAP (Food Stamps) on Child Health Outcomes When Participation Is Endogenous and Misreported

Intersection Bounds: Estimation and Inference

Intersection Bounds: estimation and inference

Identifying the Effects of SNAP (Food Stamps) on Child Health Outcomes When Participation Is Endogenous and Misreported

References

Evolution and Rationality Some Recent Game-Theoretic Results. Identification and Estimation of Local Average Treatment Effects

Dummy Endogenous Variables in a Simultaneous Equation System

Varieties of selection bias

Identification and Estimation of Local Average Treatment Effects

An IV Model of Quantile Treatment Effects

Related Papers (5)

Identification and Estimation of Local Average Treatment Effects

Nonparametric Bounds on Treatment Effects

Confidence intervals for partially identified parameters

An IV Model of Quantile Treatment Effects

Bounds on Treatment Effects from Studies with Imperfect Compliance