What is the optimal solution of the problem?

The authors assume that the distribution of Y is symmetrical around zero and hence x is an optimal solution of the true problem Suppose now that the distribution of Y is continuous with density function g y

what is the second proof of AN?

Let us nally observe that since f x and hence fN x are linear on A and AN is the set of minimizers of fN x over A it follows that AN is a face of ALet us give now the second proof

What is the true problem with the IRm?

It follows from the assumptions i and ii that the expected value function f x is piecewise linear and convex and hence f x can be represented as amaximum of a nite number of a ne functions i x i n Consequently the space IRm can be partitioned into a union of convex polyhedral sets C

what is the AN of the approximating problem?

This shows that w p for N large enough the set AN of optimal solutions of the approximating problem is non emptySince f x is piecewise linear and convex the authors have that subdi erentials of f x are convex compact polyhedral sets and by Lemma it follows that the total number of the extreme points of all subdi erentials f x is nite Moreover since for any x A the authors have that f x it follows that there exists such that the distance from the null vector IRm to f x is greater than for all x A Together withthis implies that w p for N large enough fN x for all x A and hence any x A cannot be an optimal solution of the approximating problem

What is the optimal solution of the approximating problem?

By the de nition of the set F the authors have that if N F then N d for all d T x Sm Consequently in that case xN x is the unique optimal solution of the approximating problem

what is the rst order i e linear growth of f x?

Under such conditions rst order i e linear growth of f x holds globally i e for all xTheorem Suppose that i for every the function h is convex ii the expected value function f is well de ned and is nite valued iii the set is closed and convex iv assumption A holds

what is the AN of optimal solutions of the approximating problem?

Therefore w p for N large enough the authors have that fN xi f xi for i f g and fN xj f xj for j f qg and hence condition follows Together with assertion c of Lemma this proves that AN is non empty and forms a face of AUnder the assumptions of the above theorem the set AN of optimal solutions of the approximating problem is convex and polyhedral

(Open Access) On the Rate of Convergence of Optimal Solutions of Monte Carlo Approximations of Stochastic Programs (2000) | Alexander Shapiro

Q: What have the authors contributed in "On rate of convergence of optimal solutions of monte carlo approximations of stochastic programs" ?

In this paper the authors discuss Monte Carlo simulation based approximations of a stochastic programming problem The authors show that if the corresponding random functions are convex piecewise smooth and the distribution is discrete then under mild additional assumptions an optimal solution of the approximating problem provides an exact optimal solution of the true problem with probability one for su ciently large sample size Moreover by using theory of Large Deviations they show that the probability of such an event approaches one exponentially fast with increase of the sample size The authors present some numerical examples to illustrate the involved ideas The obtained results suggest that in such cases Monte Carlo simulation based methods could be very e cient

Q: what is the simplest way to show that f x is a positive homo gene?

Note that since f x is a positively homo geneous convex real valued and hence continuous function it follows from that f x d kdk for some and all d T x

ON RATE OF CONVERGENCE OF OPTIMAL SOLUTIONS OF

MONTE CARLO APPROXIMATIONS OF STOCHASTIC

PROGRAMS

ALEXANDER SHAPIRO



AND

TITO HOMEM-DE-MELLO

Abstract.

In this paper we discuss Monte Carlo simulation based approximations of a stochastic

programming problem. Weshow that if the corresponding random functions are convex piecewise

smooth and the distribution is discrete, then (under mild additional assumptions) an optimal solution

of the approximating problem provides an

exact

optimal solution of the true problem with probability

one for suciently large sample size. Moreover, by using theory of Large Deviations, we showthat

the probabilityofsuchanevent approaches one exponentially fast with increase of the sample size.

In particular, this happ ens in the case of two stage sto chastic programming with recourse if the

corresponding distributions are discrete. The obtained results suggest that, in such cases, Monte

Carlo simulation based methods could b e very ecient. We presentsomenumerical examples to

illustrate the involved ideas.

Key words.

Two-stage stochastic programming with recourse, Monte Carlo simulation, Large

Deviations theory, convex analysis

AMS sub ject classications.

90C15, 90C25

1. Intro duction.

We discuss in this paper Monte Carlo approximations of sto chas-

tic programming problems of the form

Min



(

):=

(

x !

)



(1.1)

where

is a probability measure on a sample space (



),  is a subset of

and





is a real valued function. We refer to the above problem as the

\true" optimization problem. By generating an independent identically distributed

(i.i.d.) random sample

:::!

in (



), according to the distribution

, one can

construct the corresp onding approximating program

Min



(

):=

;

(

x !

)



(1.2)

An optimal solution ^

of (1.2) provides an approximation (an estimator) of an

optimal solution of the true problem (1.1).

There are numerous publications where various aspects of convergence prop erties

of ^

are discussed. Supp ose that the true problem has a non empty set

of optimal

solutions. It is p ossible to show that, under mild regularity conditions, the distance

dist( ^

A

), from ^

to the set

, converges with probability one (w.p.1) to zero

! 1

. There is a vast literature in Statistics dealing with such consistency

properties of empirical estimators. In the context of sto chastic programming wecan

mention recent works 9],14],17], where this problem is approached from the point

of view of the epiconvergence theory.



School of Industrial and Systems Engineering, Georgia Institute of Technology,Atlanta, Georgia

30332-0205, USA. Email: ashapiro@isye.gatech.edu. This work was supported, in part, bygrant

DMI-9713878 from the National Science Foundation.

Department of Industrial, Welding and Systems Engineering, The Ohio State University, Colum-

bus, Ohio 43210-1271, USA. Email:homem-de-mello.1@osu.edu

ALEXANDER SHAPIRO AND TITO HOMEM-DE-MELLO

It is also p ossible to givevarious estimates of the rate of convergence of ^

Central Limit Theorem type results give such estimates of order

(

;

) for the

distance dist( ^

A

) (e.g., 15], 20]), and the Large Deviations theory shows that one

may exp ect that, for any given

0, the probabilityofthe event dist( ^

A

)



approaches zero exp onentially fast as

! 1

(see, e.g., 13],16],19]). These are

general results and it seems that they describ e the situation quite accurately in case

the involved distributions are continuous. However, it app ears that the asymptotics

are completely dierent if the distributions are

discrete

. Weshow that in such cases,

under rather natural assumptions, the approximating problem (1.2) provides an

exact

optimal solution of the true problem (1.1) for

large enough. That is, ^

w.p.1

for suciently large

. Even more surprisingly we show that the probabilityof the

event

tends to zero exponentially fast as

. That is what happ ens

in the case of two stage sto chastic programming with recourse if the corresponding

distributions are discrete. This indicates that, in such cases, Monte Carlo simulation

based metho ds could b e very ecient.

In order to motivate the discussion, let us consider the following simple example.

Let

 ::: Y

be indep endent identically distributed real valued random variables.

Consider the following optimization problem

Min

(

):=



;

(1.3)

This problem is a particular case of two stage sto chastic programming with simple

recourse. Clearly the ob jective function

(

) can be written in the form

(

) :=

(

), where

(

):=

;

. Therefore the ab ove optimization problem

is separable. It is well known that a minimizer of

(



) is given by the median of

the distribution of

. Suppose that the distribution of the random variables

symmetrical around zero. Then 

:= (0

 :::

0) is an optimal solution of (1.3).

Nowlet

 ::: Y

be an i.i.d. random sample of

realizations of the random

vector

 ::: Y

). Consider the following sample average approximation of (1.3)

Min

(

):=

;

(

x Y

)





(1.4)

where

(

x y

):=

;

, with

x y

. An optimal solution of the ab ove

approximating problem (1.4) is given by ^

:= (^

 :::

), where ^

is the

sample median of

 ::: Y

Suppose for the moment that

= 1, i.e. we are minimizing

;

over

. We assume that the distribution of

is symmetrical around zero and hence



= 0 is an optimal solution of the true problem. Supp ose now that the distribution

is continuous with density function

(

). Then it is well known (e.g., 6]) that

the corresp onding sample median ^

is asymptotically normal. That is,

;



) converges in distribution to normal with zero mean and variance 2

(

)]

;

. For

example, if

is uniformly distributed on the interval 

;



1], then

;



)



1). This means that for

= 100 we may expect ^

to be in the (so-called

condence) interval 

;



2] with probability of about 95%. Now for

m >

1 we

have that the events ^



;



2],

 ::: m

, are indep endent (this is b ecause

we assume that

are independent). Therefore the probability that

each

sample

RATE OF CONVERGENCE OF MONTE-CARLO APPROXIMATIONS

median ^

will be inside the interval 

;



2] is about 0

. For example, for

= 50, this probability becomes 0

077. If wewant that probabilityto be

about 0.95 we havetoincrease the interval to 

;



3], which constitutes 30% of

the range of the random variable

. In other words for that sample size and with

= 50 our sample estimate will b e not accurate.

The situation b ecomes quite dierentifwe assume that

has a discrete distribu-

tion. Suppose now that

can takevalues

;

1, 0 and 1 with equal probabilities 1

In that case the true problem has unique optimal solution 

=0. The corresponding

sample estimate ^

can b e equal to

;

1,0or1. Wehave that the event

happens if more than half of the sample p oints are equal to one. Probabilityof thatis

given by

(

X>N=

2), where

has a binomial distribution

(

N

3). If exactly half

of the sample points are equal to one, then the sample estimate can b e anynumber

in the interval 0



1]. Similar conclusions hold for the event

;

. Therefore

the probabilitythat ^

= 0 is at least 1

;

(



2). For

= 100, this proba-

bilityis0

9992. Therefore the probability that the sample estimate ^

,given byan

optimal solution of the approximating problem (1.4) with the sample size

=100

and the numb er of random variables

= 50, is at least 0

9992

96. With the

sample size

= 120 and the numb er of random variables

= 200 this probability,

of ^

= 0, is about 0

9998

200

= 0

95. Note that the number of scenarios for that

problem is 3

200

, which is not small byany standard. And yet with sample size of only

120 the approximating problem pro duces an estimator which is exactly equal to the

true optimal solution with probability of 95%.

The ab ove problem, although simple, illustrates the phenomenon of exponential

convergence referred to in the title of the paper. In the ab ove example the correspond-

ing probabilities can be calculated in a closed form, but in the general case of course

we cannot exp ect to do so. The purp ose of this pap er is to extend this discussion to

a class of sto chastic programming problems satisfying some assumptions. Our goal is

to exhibit some

qualitative

(rather than quantitative) results. We do not prop ose an

algorithm, but rather show asymptotic prop erties of Monte Carlo simulation based

methods.

The pap er is organized as follows. In section 2 we show almost sure (w.p.1)

occurrence of the event

(recall that

is the set of optimal solutions of

the \true" problem). In section 3 we take a step further and, using techniques from

Large Deviations theory,weshow that the probabilityofthat event approaches one

exponentially fast. In section 4 we discuss the median problem in more detail, and

presentsomenumerical results for a two-stage sto chastic programming problem with

complete recourse. Finally, section 5 presents some conclusions.

2. Almost sure convergence.

Consider the \true" sto chastic programming

problem (1.1). For the sake of simplicitywe assume that the corresp onding expected

value function

(

) :=

(

x !

) exists (and in particular is nite valued) for all

. For example, if the probability measure

has a nite supp ort (i.e. the

distribution

is discrete and can take a nite numb er of dierentvalues), and hence

the space  can b e taken to b e nite, say:=

 ::: !

, and

is given bythe

probabilities

 ::: K

,wehave

(

x !

(

x !

)

(2.1)

We assume that the feasible set  is closed and convex, and that for every

, the

function

(



!

)isconvex. This implies that the exp ected value function

(



)isalso

ALEXANDER SHAPIRO AND TITO HOMEM-DE-MELLO

convex, and hence the \true" problem (1.1) is convex. Also if

is discrete and the

functions

(



!

:::K

, are piecewise linear and convex, then

(



) is piecewise

linear and convex. That is what happens in twostagestochastic programming with

a nite number of scenarios.

Let

 ::: !

be an i.i.d. random sample in (



), generated according to the

distribution

, and consider the corresp onding approximating program (1.2). Note

that, since the functions

(



!

) are convex, the approximating (sample average)

function

(



) is also convex, and hence the approximating program (1.2) is convex.

We show in this section that, under some natural assumptions which hold for

instance in the case of two stage sto chastic programming with a nite number of

scenarios, with probability one (w.p.1) for

large enough any optimal solution of

the approximating problem (1.2) b elongs to the set of optimal solutions of the true

problem (1.1). That is, problem (1.2) yields an

exact

optimal solution (w.p.1) when

is suciently large.

The statement: \w.p.1 for

large enough" should be understo od in the sense

that for

-almost every

 there exists



(

)



such that for any





the corresp onding statement holds. The number



is a function of

,i.e. dep ends

on the random sample, and therefore in itself is random. Note also that, since con-

vergence w.p.1 implies convergence in probability, the ab ove statement implies that

the probability of the corresp onding event to happ en tends to one as the sample size

tends to innity.

We denote by

the set of optimal solutions of the true problem (1.1), and by

(

x d

) the directional derivativeof

in the direction

. Note that the set

is convex and closed, and since

is a real valued convex function, the directional

derivative

(

x d

) exists, for all

and

, and is convex in

. We discuss initially the

case when

is a singleton later we will consider the general setting.

Assumption (A)

The true problem (1.1) p ossesses unique optimal solution



i.e.



, and there exists a p ositive constant

suchthat

(

)



(

;







(2.2)

Of course condition (2.2), in itself, implies that 

is the unique optimal solution of

(1.1). In the approximation theory optimal solutions satisfying (2.2) are called sharp

minima. It is not dicult to show, since problem (1.1) is convex, that assumption

(A) holds i

(

x d

)



(

)



(2.3)

where



(

) denotes the tangent cone to  at 

In particular, if

(

) is dieren-

tiable at 

, then assumption (A) (or equivalently (2.3)) holds i

(

)belongsto

the interior of the normal cone to  at 

. Note, that since

(

x



) is a positively homo-

geneous convex real valued (and hence continuous) function, it follows from (2.3) that

(

x d

)



for some

0 and all



(

). We refer to a recent paper 4], and

references therein, for a discussion of that condition and some of its generalizations.

If the function

(

) is piecewise linear and the set  is p olyhedral, then problem

(1.1) can b e formulated as a linear programming problem, and the ab ove assumption

(A) always holds provided 

is the unique optimal solution of (1.1). This happ ens,

for example, in the case of a two stage linear stochastic programming problem with

a nite number of scenarios provided it has a unique optimal solution. Note that

assumption (A) is not restricted to such situations only. In fact, in some of our

numerical experiments sharp minima (i.e. assumption (A)) happ ened quite often in

RATE OF CONVERGENCE OF MONTE-CARLO APPROXIMATIONS

the case of continuous (normal) distributions. Furthermore, b ecause the problem is

assumed to b e convex, sharp minima is equivalent to rst order sucient conditions.

Under such conditions, rst order (i.e. linear) growth (2.2) of

(

) holds

global ly

,i.e.

for all

.

Theorem 2.1.

Suppose that:

(i)

for every



the function

(



!

)

is convex,

(ii)

the expected value function

(



)

is wel l dened and is nite valued,

(iii)

the set



is closed and convex,

(iv)

assumption (A) holds. Then w.p.1 for

large enough the

approximating problem

(1.2)

has a unique optimal solution

and

=

Proof of the ab ove theorem is based on the following prop osition. Results of that

proposition (p erhaps not exactly in that form) are basically known, but since its pro of

is simple we give it for the sake of completeness. Denote by

(

x d

) the directional

derivativeof

(



!

) at the p oint

in the direction

, and by

(

B C

) the Hausdor

distance b etween sets

B C



, that is

(

B C

) := max



sup

dist(

x B

)



sup

dist(

x C

)



(2.4)

Proposition 2.2.

Suppose that the assumptions

(i)

and

(ii)

,ofTheorem

2.1

,are

satised. Then, for any

x d

, the fol lowing holds:

(

x d

(

x d

)



(2.5)

lim

sup

k



(

x d

)

;

(

x d

)



 w:p:



(2.6)

lim



(

)

@f

(

)



 w :p:

(2.7)

Pro of.

Since

(



) is convex wehave that

(

x d

) = inf

(

)

;

(

)



(2.8)

and the ratio in the righthandside of (2.8) decreases monotonically as

decreases

to zero, and similarly for the functions

(



!

). It follows then by the Monotone

Convergence Theorem that

(

x d



inf

(

td !

)

;

(

x !

)





(2.9)

and hence the right hand side of (2.5) is well dened and the equation follows.

Wehave that

(

x d

;

(

x d

)

(2.10)

Therefore by the strong form of the Law of Large Numbers it follows from (2.5) that

for any

(

x d

) converges to

(

x d

) w.p.1 as

. Consequently for

any countable set



we have that the event: \lim

(

x d

) =

(

x d

)

for all

" happens w.p.1. Let us take a countable and dense subset

Recall that if a sequence of real valued convex functions converges p ointwise on a

On the Rate of Convergence of Optimal Solutions of Monte Carlo Approximations of Stochastic Programs

Citations

Lectures on Stochastic Programming: Modeling and Theory

Distributionally Robust Optimization Under Moment Uncertainty with Application to Data-Driven Problems

Monte Carlo Sampling Methods

Uncertain convex programs: randomized solutions and confidence levels

A Sample Approximation Approach for Optimization with Probabilistic Constraints

References

Nonlinear Programming

Large Deviations Techniques and Applications

Convex analysis and minimization algorithms

Perturbation Analysis of Optimization Problems

Large deviations

Related Papers (5)

The Sample Average Approximation Method for Stochastic Discrete Optimization

Monte Carlo bounding techniques for determining solution quality in stochastic programs

Introduction to Stochastic Programming

Monte Carlo Sampling Methods

Discrete Event Systems: Sensitivity Analysis and Stochastic Optimization by the Score Function Method

Frequently Asked Questions (9)

Q1. What have the authors contributed in "On rate of convergence of optimal solutions of monte carlo approximations of stochastic programs" ?

Q2. What is the optimal solution of the problem?

Q3. what is the simplest way to show that f x is a positive homo gene?

Q4. what is the second proof of AN?

Q5. What is the true problem with the IRm?

Q6. what is the AN of the approximating problem?

Q7. What is the optimal solution of the approximating problem?

Q8. what is the rst order i e linear growth of f x?

Q9. what is the AN of optimal solutions of the approximating problem?