scispace - formally typeset
Open AccessJournal ArticleDOI

On the Rate of Convergence of Optimal Solutions of Monte Carlo Approximations of Stochastic Programs

Reads0
Chats0
TLDR
It is shown that if the corresponding random functions are convex piecewise linear and the distribution is discrete, then an optimal solution of the approximating problem provides an exact optimal solution to the true problem with probability one for sufficiently large sample size.
Abstract
In this paper we discuss Monte Carlo simulation based approximations of a stochastic programming problem. We show that if the corresponding random functions are convex piecewise linear and the distribution is discrete, then an optimal solution of the approximating problem provides an exact optimal solution of the true problem with probability one for sufficiently large sample size. Moreover, by using the theory of large deviations, we show that the probability of such an event approaches one exponentially fast with increase of the sample size. In particular, this happens in the case of linear two- (or multi-) stage stochastic programming with recourse if the corresponding distributions are discrete. The obtained results suggest that, in such cases, Monte Carlo simulation based methods could be very efficient. We present some numerical examples to illustrate the ideas involved.

read more

Content maybe subject to copyright    Report

ON RATE OF CONVERGENCE OF OPTIMAL SOLUTIONS OF
MONTE CARLO APPROXIMATIONS OF STOCHASTIC
PROGRAMS
ALEXANDER SHAPIRO
AND
TITO HOMEM-DE-MELLO
y
Abstract.
In this paper we discuss Monte Carlo simulation based approximations of a stochastic
programming problem. Weshow that if the corresponding random functions are convex piecewise
smooth and the distribution is discrete, then (under mild additional assumptions) an optimal solution
of the approximating problem provides an
exact
optimal solution of the true problem with probability
one for suciently large sample size. Moreover, by using theory of Large Deviations, we showthat
the probabilityofsuchanevent approaches one exponentially fast with increase of the sample size.
In particular, this happ ens in the case of two stage sto chastic programming with recourse if the
corresponding distributions are discrete. The obtained results suggest that, in such cases, Monte
Carlo simulation based methods could b e very ecient. We presentsomenumerical examples to
illustrate the involved ideas.
Key words.
Two-stage stochastic programming with recourse, Monte Carlo simulation, Large
Deviations theory, convex analysis
AMS sub ject classications.
90C15, 90C25
1. Intro duction.
We discuss in this paper Monte Carlo approximations of sto chas-
tic programming problems of the form
Min
x
2
f
f
(
x
):=
IE
P
h
(
x !
)
g
(1.1)
where
P
is a probability measure on a sample space (
F
), is a subset of
IR
m
and
h
:
IR
m
!
IR
is a real valued function. We refer to the above problem as the
\true" optimization problem. By generating an independent identically distributed
(i.i.d.) random sample
!
1
:::!
N
in (
F
), according to the distribution
P
, one can
construct the corresp onding approximating program
Min
x
2
8
<
:
^
f
N
(
x
):=
N
;
1
N
X
j
=1
h
(
x !
j
)
9
=
:
(1.2)
An optimal solution ^
x
N
of (1.2) provides an approximation (an estimator) of an
optimal solution of the true problem (1.1).
There are numerous publications where various aspects of convergence prop erties
of ^
x
N
are discussed. Supp ose that the true problem has a non empty set
A
of optimal
solutions. It is p ossible to show that, under mild regularity conditions, the distance
dist( ^
x
N
A
), from ^
x
N
to the set
A
, converges with probability one (w.p.1) to zero
as
N
! 1
. There is a vast literature in Statistics dealing with such consistency
properties of empirical estimators. In the context of sto chastic programming wecan
mention recent works 9],14],17], where this problem is approached from the point
of view of the epiconvergence theory.
School of Industrial and Systems Engineering, Georgia Institute of Technology,Atlanta, Georgia
30332-0205, USA. Email: ashapiro@isye.gatech.edu. This work was supported, in part, bygrant
DMI-9713878 from the National Science Foundation.
y
Department of Industrial, Welding and Systems Engineering, The Ohio State University, Colum-
bus, Ohio 43210-1271, USA. Email:homem-de-mello.1@osu.edu
1

2
ALEXANDER SHAPIRO AND TITO HOMEM-DE-MELLO
It is also p ossible to givevarious estimates of the rate of convergence of ^
x
N
to
A
.
Central Limit Theorem type results give such estimates of order
O
p
(
N
;
1
=
2
) for the
distance dist( ^
x
N
A
) (e.g., 15], 20]), and the Large Deviations theory shows that one
may exp ect that, for any given
">
0, the probabilityofthe event dist( ^
x
N
A
)
"
approaches zero exp onentially fast as
N
! 1
(see, e.g., 13],16],19]). These are
general results and it seems that they describ e the situation quite accurately in case
the involved distributions are continuous. However, it app ears that the asymptotics
are completely dierent if the distributions are
discrete
. Weshow that in such cases,
under rather natural assumptions, the approximating problem (1.2) provides an
exact
optimal solution of the true problem (1.1) for
N
large enough. That is, ^
x
N
2
A
w.p.1
for suciently large
N
. Even more surprisingly we show that the probabilityof the
event
f
^
x
N
62
A
g
tends to zero exponentially fast as
N
!1
. That is what happ ens
in the case of two stage sto chastic programming with recourse if the corresponding
distributions are discrete. This indicates that, in such cases, Monte Carlo simulation
based metho ds could b e very ecient.
In order to motivate the discussion, let us consider the following simple example.
Let
Y
1
::: Y
m
be indep endent identically distributed real valued random variables.
Consider the following optimization problem
Min
x
2
IR
m
(
f
(
x
):=
IE
m
X
i
=1
j
Y
i
;
x
i
j
!)
:
(1.3)
This problem is a particular case of two stage sto chastic programming with simple
recourse. Clearly the ob jective function
f
(
x
) can be written in the form
f
(
x
) :=
P
m
i
=1
f
i
(
x
i
), where
f
i
(
x
i
):=
IE
fj
Y
i
;
x
i
jg
. Therefore the ab ove optimization problem
is separable. It is well known that a minimizer of
f
i
(
) is given by the median of
the distribution of
Y
i
. Suppose that the distribution of the random variables
Y
i
is
symmetrical around zero. Then
x
:= (0
:::
0) is an optimal solution of (1.3).
Nowlet
Y
1
::: Y
N
be an i.i.d. random sample of
N
realizations of the random
vector
Y
=(
Y
1
::: Y
m
). Consider the following sample average approximation of (1.3)
Min
x
2
IR
m
8
<
:
^
f
N
(
x
):=
N
;
1
N
X
j
=1
h
(
x Y
j
)
9
=
(1.4)
where
h
(
x y
):=
P
m
i
=1
j
y
i
;
x
i
j
, with
x y
2
IR
m
. An optimal solution of the ab ove
approximating problem (1.4) is given by ^
x
N
:= (^
x
1
N
:::
^
x
mN
), where ^
x
iN
is the
sample median of
Y
1
i
::: Y
N
i
.
Suppose for the moment that
m
= 1, i.e. we are minimizing
IE
fj
Y
;
x
jg
over
x
2
IR
. We assume that the distribution of
Y
is symmetrical around zero and hence
x
= 0 is an optimal solution of the true problem. Supp ose now that the distribution
of
Y
is continuous with density function
g
(
y
). Then it is well known (e.g., 6]) that
the corresp onding sample median ^
x
N
is asymptotically normal. That is,
N
1
=
2
(^
x
N
;
x
) converges in distribution to normal with zero mean and variance 2
g
(
x
)]
;
2
. For
example, if
Y
is uniformly distributed on the interval
;
1
1], then
N
1
=
2
(^
x
N
;
x
)
)
N
(0
1). This means that for
N
= 100 we may expect ^
x
N
to be in the (so-called
condence) interval
;
0
:
2
0
:
2] with probability of about 95%. Now for
m >
1 we
have that the events ^
x
iN
2
;
0
:
2
0
:
2],
i
=1
::: m
, are indep endent (this is b ecause
we assume that
Y
i
are independent). Therefore the probability that
each
sample

RATE OF CONVERGENCE OF MONTE-CARLO APPROXIMATIONS
3
median ^
x
iN
will be inside the interval
;
0
:
2
0
:
2] is about 0
:
95
m
. For example, for
m
= 50, this probability becomes 0
:
95
50
=0
:
077. If wewant that probabilityto be
about 0.95 we havetoincrease the interval to
;
0
:
3
0
:
3], which constitutes 30% of
the range of the random variable
Y
. In other words for that sample size and with
m
= 50 our sample estimate will b e not accurate.
The situation b ecomes quite dierentifwe assume that
Y
has a discrete distribu-
tion. Suppose now that
Y
can takevalues
;
1, 0 and 1 with equal probabilities 1
=
3.
In that case the true problem has unique optimal solution
x
=0. The corresponding
sample estimate ^
x
N
can b e equal to
;
1,0or1. Wehave that the event
f
^
x
N
=1
g
happens if more than half of the sample p oints are equal to one. Probabilityof thatis
given by
P
(
X>N=
2), where
X
has a binomial distribution
B
(
N
1
=
3). If exactly half
of the sample points are equal to one, then the sample estimate can b e anynumber
in the interval 0
1]. Similar conclusions hold for the event
f
^
x
N
=
;
1
g
. Therefore
the probabilitythat ^
x
N
= 0 is at least 1
;
2
P
(
X
N=
2). For
N
= 100, this proba-
bilityis0
:
9992. Therefore the probability that the sample estimate ^
x
N
,given byan
optimal solution of the approximating problem (1.4) with the sample size
N
=100
and the numb er of random variables
m
= 50, is at least 0
:
9992
50
=0
:
96. With the
sample size
N
= 120 and the numb er of random variables
m
= 200 this probability,
of ^
x
N
= 0, is about 0
:
9998
200
= 0
:
95. Note that the number of scenarios for that
problem is 3
200
, which is not small byany standard. And yet with sample size of only
120 the approximating problem pro duces an estimator which is exactly equal to the
true optimal solution with probability of 95%.
The ab ove problem, although simple, illustrates the phenomenon of exponential
convergence referred to in the title of the paper. In the ab ove example the correspond-
ing probabilities can be calculated in a closed form, but in the general case of course
we cannot exp ect to do so. The purp ose of this pap er is to extend this discussion to
a class of sto chastic programming problems satisfying some assumptions. Our goal is
to exhibit some
qualitative
(rather than quantitative) results. We do not prop ose an
algorithm, but rather show asymptotic prop erties of Monte Carlo simulation based
methods.
The pap er is organized as follows. In section 2 we show almost sure (w.p.1)
occurrence of the event
f
^
x
N
2
A
g
(recall that
A
is the set of optimal solutions of
the \true" problem). In section 3 we take a step further and, using techniques from
Large Deviations theory,weshow that the probabilityofthat event approaches one
exponentially fast. In section 4 we discuss the median problem in more detail, and
presentsomenumerical results for a two-stage sto chastic programming problem with
complete recourse. Finally, section 5 presents some conclusions.
2. Almost sure convergence.
Consider the \true" sto chastic programming
problem (1.1). For the sake of simplicitywe assume that the corresp onding expected
value function
f
(
x
) :=
IE
P
h
(
x !
) exists (and in particular is nite valued) for all
x
2
IR
m
. For example, if the probability measure
P
has a nite supp ort (i.e. the
distribution
P
is discrete and can take a nite numb er of dierentvalues), and hence
the space can b e taken to b e nite, say:=
f
!
1
::: !
K
g
, and
P
is given bythe
probabilities
P
f
!
=
!
k
g
=
p
k
,
k
=1
::: K
,wehave
IE
P
h
(
x !
)=
K
X
k
=1
p
k
h
(
x !
k
)
:
(2.1)
We assume that the feasible set is closed and convex, and that for every
!
2
, the
function
h
(
!
)isconvex. This implies that the exp ected value function
f
(
)isalso

4
ALEXANDER SHAPIRO AND TITO HOMEM-DE-MELLO
convex, and hence the \true" problem (1.1) is convex. Also if
P
is discrete and the
functions
h
(
!
k
),
k
=1
:::K
, are piecewise linear and convex, then
f
(
) is piecewise
linear and convex. That is what happens in twostagestochastic programming with
a nite number of scenarios.
Let
!
1
::: !
N
be an i.i.d. random sample in (
F
), generated according to the
distribution
P
, and consider the corresp onding approximating program (1.2). Note
that, since the functions
h
(
!
j
) are convex, the approximating (sample average)
function
^
f
N
(
) is also convex, and hence the approximating program (1.2) is convex.
We show in this section that, under some natural assumptions which hold for
instance in the case of two stage sto chastic programming with a nite number of
scenarios, with probability one (w.p.1) for
N
large enough any optimal solution of
the approximating problem (1.2) b elongs to the set of optimal solutions of the true
problem (1.1). That is, problem (1.2) yields an
exact
optimal solution (w.p.1) when
N
is suciently large.
The statement: \w.p.1 for
N
large enough" should be understo od in the sense
that for
P
-almost every
!
2
there exists
N
=
N
(
!
)
such that for any
N
N
the corresp onding statement holds. The number
N
is a function of
!
,i.e. dep ends
on the random sample, and therefore in itself is random. Note also that, since con-
vergence w.p.1 implies convergence in probability, the ab ove statement implies that
the probability of the corresp onding event to happ en tends to one as the sample size
N
tends to innity.
We denote by
A
the set of optimal solutions of the true problem (1.1), and by
f
0
(
x d
) the directional derivativeof
f
at
x
in the direction
d
. Note that the set
A
is convex and closed, and since
f
is a real valued convex function, the directional
derivative
f
0
(
x d
) exists, for all
x
and
d
, and is convex in
d
. We discuss initially the
case when
A
is a singleton later we will consider the general setting.
Assumption (A)
The true problem (1.1) p ossesses unique optimal solution
x
,
i.e.
A
=
f
x
g
, and there exists a p ositive constant
c
suchthat
f
(
x
)
f
(
x
)+
c
k
x
;
x
k
8
x
2
:
(2.2)
Of course condition (2.2), in itself, implies that
x
is the unique optimal solution of
(1.1). In the approximation theory optimal solutions satisfying (2.2) are called sharp
minima. It is not dicult to show, since problem (1.1) is convex, that assumption
(A) holds i
f
0
(
x d
)
>
0
8
d
2
T
(
x
)
nf
0
g
(2.3)
where
T
(
x
) denotes the tangent cone to at
x
.
In particular, if
f
(
x
) is dieren-
tiable at
x
, then assumption (A) (or equivalently (2.3)) holds i
;r
f
(
x
)belongsto
the interior of the normal cone to at
x
. Note, that since
f
0
(
x
) is a positively homo-
geneous convex real valued (and hence continuous) function, it follows from (2.3) that
f
0
(
x d
)
"
k
d
k
for some
">
0 and all
d
2
T
(
x
). We refer to a recent paper 4], and
references therein, for a discussion of that condition and some of its generalizations.
If the function
f
(
x
) is piecewise linear and the set is p olyhedral, then problem
(1.1) can b e formulated as a linear programming problem, and the ab ove assumption
(A) always holds provided
x
is the unique optimal solution of (1.1). This happ ens,
for example, in the case of a two stage linear stochastic programming problem with
a nite number of scenarios provided it has a unique optimal solution. Note that
assumption (A) is not restricted to such situations only. In fact, in some of our
numerical experiments sharp minima (i.e. assumption (A)) happ ened quite often in

RATE OF CONVERGENCE OF MONTE-CARLO APPROXIMATIONS
5
the case of continuous (normal) distributions. Furthermore, b ecause the problem is
assumed to b e convex, sharp minima is equivalent to rst order sucient conditions.
Under such conditions, rst order (i.e. linear) growth (2.2) of
f
(
x
) holds
global ly
,i.e.
for all
x
2
.
Theorem 2.1.
Suppose that:
(i)
for every
!
2
the function
h
(
!
)
is convex,
(ii)
the expected value function
f
(
)
is wel l dened and is nite valued,
(iii)
the set
is closed and convex,
(iv)
assumption (A) holds. Then w.p.1 for
N
large enough the
approximating problem
(1.2)
has a unique optimal solution
^
x
N
and
^
x
N
=
x
.
Proof of the ab ove theorem is based on the following prop osition. Results of that
proposition (p erhaps not exactly in that form) are basically known, but since its pro of
is simple we give it for the sake of completeness. Denote by
h
0
!
(
x d
) the directional
derivativeof
h
(
!
) at the p oint
x
in the direction
d
, and by
H
(
B C
) the Hausdor
distance b etween sets
B C
IR
m
, that is
H
(
B C
) := max
sup
x
2
C
dist(
x B
)
sup
x
2
B
dist(
x C
)
:
(2.4)
Proposition 2.2.
Suppose that the assumptions
(i)
and
(ii)
,ofTheorem
2.1
,are
satised. Then, for any
x d
2
IR
m
, the fol lowing holds:
f
0
(
x d
)=
IE
P
f
h
0
!
(
x d
)
g
(2.5)
lim
N
!1
sup
k
d
k
1
f
0
(
x d
)
;
^
f
0
N
(
x d
)
=0
w:p:
1
(2.6)
lim
N
!1
H
@
^
f
N
(
x
)
@f
(
x
)
=0
w :p:
1
:
(2.7)
Pro of.
Since
f
(
) is convex wehave that
f
0
(
x d
) = inf
t>
0
f
(
x
+
td
)
;
f
(
x
)
t
(2.8)
and the ratio in the righthandside of (2.8) decreases monotonically as
t
decreases
to zero, and similarly for the functions
h
(
!
). It follows then by the Monotone
Convergence Theorem that
f
0
(
x d
)=
IE
P
inf
t>
0
h
(
x
+
td !
)
;
h
(
x !
)
t
(2.9)
and hence the right hand side of (2.5) is well dened and the equation follows.
Wehave that
^
f
0
N
(
x d
)=
N
;
1
N
X
j
=1
h
0
!
j
(
x d
)
:
(2.10)
Therefore by the strong form of the Law of Large Numbers it follows from (2.5) that
for any
d
2
IR
m
,
^
f
0
N
(
x d
) converges to
f
0
(
x d
) w.p.1 as
N
!1
. Consequently for
any countable set
D
IR
m
we have that the event: \lim
N
!1
^
f
0
N
(
x d
) =
f
0
(
x d
)
for all
d
2
D
" happens w.p.1. Let us take a countable and dense subset
D
of
IR
m
.
Recall that if a sequence of real valued convex functions converges p ointwise on a

Citations
More filters
Book

Lectures on Stochastic Programming: Modeling and Theory

TL;DR: The authors dedicate this book to Julia, Benjamin, Daniel, Natan and Yael; to Tsonka, Konstatin and Marek; and to the Memory of Feliks, Maria, and Dentcho.
Journal ArticleDOI

Distributionally Robust Optimization Under Moment Uncertainty with Application to Data-Driven Problems

TL;DR: This paper proposes a model that describes uncertainty in both the distribution form (discrete, Gaussian, exponential, etc.) and moments (mean and covariance matrix) and demonstrates that for a wide range of cost functions the associated distributionally robust stochastic program can be solved efficiently.
Book ChapterDOI

Monte Carlo Sampling Methods

TL;DR: In this article, Monte Carlo sampling methods for solving large scale stochastic programming problems are discussed, where a random sample is generated outside of an optimization procedure, and then the constructed, so-called sample average approximation (SAA), problem is solved by an appropriate deterministic algorithm.
Journal ArticleDOI

Uncertain convex programs: randomized solutions and confidence levels

TL;DR: This paper considers an alternative ‘randomized’ or ‘scenario’ approach for dealing with uncertainty in optimization, based on constraint sampling, and studies the constrained optimization problem resulting by taking into account only a finite set of N constraints, chosen at random among the possible constraint instances of the uncertain problem.
Journal ArticleDOI

A Sample Approximation Approach for Optimization with Probabilistic Constraints

TL;DR: This work studies approximations of optimization problems with probabilistic constraints in which the original distribution of the underlying random vector is replaced with an empirical distribution obtained from a random sample to obtain a lower bound to the true optimal value.
References
More filters
Book

Nonlinear Programming

Book

Large Deviations Techniques and Applications

Amir Dembo, +1 more
TL;DR: The LDP for Abstract Empirical Measures and applications-The Finite Dimensional Case and Applications of Empirically Measures LDP are presented.
Book

Convex analysis and minimization algorithms

TL;DR: In this article, the cutting plane algorithm is used to construct approximate subdifferentials of convex functions, and the inner construction of the subdifferential is performed by a dual form of Bundle Methods.
Book

Perturbation Analysis of Optimization Problems

TL;DR: It is shown here how the model derived recently in [Bouchut-Boyaval, M3AS (23) 2013] can be modified for flows on rugous topographies varying around an inclined plane.
Book

Large deviations

Related Papers (5)
Frequently Asked Questions (9)
Q1. What have the authors contributed in "On rate of convergence of optimal solutions of monte carlo approximations of stochastic programs" ?

In this paper the authors discuss Monte Carlo simulation based approximations of a stochastic programming problem The authors show that if the corresponding random functions are convex piecewise smooth and the distribution is discrete then under mild additional assumptions an optimal solution of the approximating problem provides an exact optimal solution of the true problem with probability one for su ciently large sample size Moreover by using theory of Large Deviations they show that the probability of such an event approaches one exponentially fast with increase of the sample size The authors present some numerical examples to illustrate the involved ideas The obtained results suggest that in such cases Monte Carlo simulation based methods could be very e cient 

The authors assume that the distribution of Y is symmetrical around zero and hence x is an optimal solution of the true problem Suppose now that the distribution of Y is continuous with density function g y 

Note that since f x is a positively homo geneous convex real valued and hence continuous function it follows from that f x d kdk for some and all d T x 

Let us nally observe that since f x and hence fN x are linear on A and AN is the set of minimizers of fN x over A it follows that AN is a face of ALet us give now the second proof 

It follows from the assumptions i and ii that the expected value function f x is piecewise linear and convex and hence f x can be represented as amaximum of a nite number of a ne functions i x i n Consequently the space IRm can be partitioned into a union of convex polyhedral sets C 

This shows that w p for N large enough the set AN of optimal solutions of the approximating problem is non emptySince f x is piecewise linear and convex the authors have that subdi erentials of f x are convex compact polyhedral sets and by Lemma it follows that the total number of the extreme points of all subdi erentials f x is nite Moreover since for any x A the authors have that f x it follows that there exists such that the distance from the null vector IRm to f x is greater than for all x A Together withthis implies that w p for N large enough fN x for all x A and hence any x A cannot be an optimal solution of the approximating problem 

By the de nition of the set F the authors have that if N F then N d for all d T x Sm Consequently in that case xN x is the unique optimal solution of the approximating problem 

Under such conditions rst order i e linear growth of f x holds globally i e for all xTheorem Suppose that i for every the function h is convex ii the expected value function f is well de ned and is nite valued iii the set is closed and convex iv assumption A holds 

Therefore w p for N large enough the authors have that fN xi f xi for i f g and fN xj f xj for j f qg and hence condition follows Together with assertion c of Lemma this proves that AN is non empty and forms a face of AUnder the assumptions of the above theorem the set AN of optimal solutions of the approximating problem is convex and polyhedral