scispace - formally typeset
Open AccessJournal ArticleDOI

The Sample Average Approximation Method for Stochastic Discrete Optimization

TLDR
A Monte Carlo simulation--based approach to stochastic discrete optimization problems, where a random sample is generated and the expected value function is approximated by the corresponding sample average function.
Abstract
In this paper we study a Monte Carlo simulation--based approach to stochastic discrete optimization problems. The basic idea of such methods is that a random sample is generated and the expected value function is approximated by the corresponding sample average function. The obtained sample average optimization problem is solved, and the procedure is repeated several times until a stopping criterion is satisfied. We discuss convergence rates, stopping rules, and computational complexity of this procedure and present a numerical example for the stochastic knapsack problem.

read more

Content maybe subject to copyright    Report

THE SAMPLE AVERAGE APPROXIMATION METHOD FOR
STOCHASTIC DISCRETE OPTIMIZATION
ANTON J. KLEYWEGT
†‡
AND ALEXANDER SHAPIRO
†§
Abstract. In this paper we study a Monte Carlo simulation based approach to stochastic
discrete optimization problems. The basic idea of such methods is that a random sample is generated
and consequently the expected value function is approximated by the corresponding sample average
function. The obtained sample average optimization problem is solved, and the procedure is repeated
several times until a stopping criterion is satisfied. We discuss convergence rates and stopping rules
of this procedure and present a numerical example of the stochastic knapsack problem.
Key words. Stochastic programming, discrete optimization, Monte Carlo sampling, Law of
Large Numbers, Large Deviations theory, sample average approximation, stopping rules, stochastic
knapsack problem
AMS subject classifications. 90C10, 90C15
1. Introduction. In this paper we consider optimization problems of the form
min
x∈S
{g(x) IE
P
G(x, W )}.(1.1)
Here W is a random vector having probability distribution P , G(x, w)isarealvalued
function, and S is a finite set, for example S can be a finite subset of IR
k
with
integer coordinates. We assume that the expected value function g(x) is well defined,
i.e. for every x ∈Sthe function G(x, ·)isP-measurable and IE
P
{|G(x, W )|} < .
We are particularly interested in problems for which the expected value function
g(x) IE
P
G(x, W ) cannot be written in a closed form and/or its values cannot be
easily calculated, while G(x, w) is easily computable for given x and w.
It is well known that many discrete optimization problems are hard to solve. Here
on top of this we have additional difficulties since the objective function g(x)canbe
complicated and/or difficult to compute even approximately. Therefore stochastic
discrete optimization problems are difficult indeed and little progress in solving such
problems numerically has been reported so far. A discussion of two stage stochastic
integer programming problems with recourse can be found in Birge and Louveaux [2].
A branch and bound approach to solving stochastic integer programming problems
was suggested by Norkin, Pflug and Ruszczynski [9]. Schultz, Stougie, and Van der
Vlerk [10] suggested an algebraic approach to solving stochastic programs with integer
recourse by using a framework of Gr¨obner basis reductions.
In this paper we study a Monte Carlo simulation based approach to stochastic
discrete optimization problems. The basic idea is simple indeed—a random sample
of W is generated and consequently the expected value function is approximated by
the corresponding sample average function. The obtained sample average optimiza-
tion problem is solved, and the procedure is repeated several times until a stopping
criterion is satisfied. The idea of using sample average approximations for solving
stochastic programs is a natural one and was used by various authors over the years.
Such an approach was used in the context of a stochastic knapsack problem in a recent
paper of Morton and Wood [7].
School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA
30332-0205.
Supported by the National Science Foundation under grant DMI-9875400.
§
Supported by the National Science Foundation under grant DMI-9713878.

2 A. J. KLEYWEGT AND A. SHAPIRO
The organization of this paper is as follows. In the next section we discuss a sta-
tistical inference of the sample average approximation method. In particular we show
that with probability approaching one exponentially fast with increase of the sample
size, an optimal solution of the sample average approximation problem provides an
exact optimal solution of the “true” problem (1.1). In section 3 we outline an algo-
rithm design for the sample average approximation approach to solving (1.1), and in
particular we discuss various stopping rules. In section 4 we present a numerical ex-
ample of the sample average approximation method applied to a stochastic knapsack
problem, and section 5 gives conclusions.
2. Convergence Results. As mentioned in the introduction, we are interested
in solving stochastic discrete optimization problems of the form (1.1). Let W
1
, ..., W
N
be an i.i.d. random sample of N realizations of the random vector W . Consider the
sample average function
ˆg
N
(x)
1
N
N
n=1
G(x, W
n
)
and the associated problem
min
x∈S
ˆg
N
(x).(2.1)
We refer to (1.1) and (2.1) as the “true” (or expected value) and sample average
approximation (SAA) problems, respectively. Note that IEg
N
(x)] = g(x).
Since the feasible set S is finite, problems (1.1) and (2.1) have nonempty sets of
optimal solutions, denoted S
and
ˆ
S
N
, respectively. Let v
and ˆv
N
denote the optimal
values,
v
min
x∈S
g(x)andˆv
N
min
x∈S
ˆg
N
(x)
of the respective problems. We also consider sets of ε-optimal solutions. That is, for
ε 0, we say that ¯x is an ε-optimal solution of (1.1) if ¯x ∈Sand gx) v
+ ε.The
sets of all ε-optimal solutions of (1.1) and (2.1) are denoted by S
ε
and
ˆ
S
ε
N
, respectively.
Clearly for ε =0,setS
ε
coincides with S
,and
ˆ
S
ε
N
coincides with
ˆ
S
N
.
2.1. Convergence of Objective Values and Solutions. In the following
proposition we show convergence with probability one (w.p.1) of the above statis-
tical estimators. By the statement “an event happens w.p.1 for N large enough” we
mean that for P -almost every realization ω = {W
1
,W
2
,...} of the random sequence,
there exists an integer N(ω) such that the considered event happens for all samples
{W
1
,...,W
n
} from ω with n N(ω). Note that in such a statement the integer
N(ω) depends on the sequence ω of realizations and therefore is random.
Proposition 2.1. The following two properties hold: (i) ˆv
N
v
w.p.1 as
N →∞,and(ii) for any ε 0, the event {
ˆ
S
ε
N
⊂S
ε
} happens w.p.1 for N large
enough.
Proof. By the strong Law of Large Numbers we have that for any x ∈Sg
N
(x)
converges to g(x) w.p.1 as N →∞. Since the set S is finite, and the union of a finite
number of sets each of measure zero also has measure zero, it follows that w.p.1, ˆg
N
(x)
converges to g(x) uniformly in x ∈S. That is, w.p.1,
δ
N
max
x∈S
|ˆg
N
(x) g(x)|→0asN →∞.(2.2)

SAMPLE AVERAGE APPROXIMATION 3
Since |ˆv
N
v
|≤δ
N
, it follows that w.p.1, ˆv
N
v
as N →∞.
For a given ε 0 consider the number
α(ε) min
x∈S\S
ε
g(x) v
ε.(2.3)
Since for any x ∈S\S
ε
it holds that g(x) >v
+ ε and the set S is finite, it follows
that α(ε) > 0.
Let N be large enough such that δ
N
(ε)/2. Then ˆv
N
<v
+ α(ε)/2, and for
any x ∈S\S
ε
it holds that ˆg
N
(x) >v
+ ε + α(ε)/2. It follows that if x ∈S\S
ε
,then
ˆg
N
(x) > ˆv
N
+ ε and hence x does not belong to the set
ˆ
S
ε
N
. The inclusion
ˆ
S
ε
N
⊂S
ε
follows, which completes the proof.
It follows that if, for some ε 0, S
ε
= {x
} is a singleton, then w.p.1,
ˆ
S
ε
N
= {x
}
for N large enough. In particular, if the true problem (1.1) has a unique optimal
solution x
, then w.p.1, for sufficiently large N the approximating problem (2.1) has
a unique optimal solution ˆx
N
and ˆx
N
= x
.
In the next section, and in section4,itisdemonstratedthatα(ε), defined in (2.3),
is an important measure of the well-conditioning of a stochastic discrete optimization
problem.
2.2. Convergence Rates. The above results do not say anything about the
rates of convergence of ˆv
N
and
ˆ
S
ε
N
to their true counterparts. In this section we
investigate such rates of convergence. By using the theory of Large Deviations (LD)
we show that, under mild regularity conditions, the probability of the event {
ˆ
S
ε
N
S
ε
} approaches one exponentially fast as N →∞. Next we briefly outline some
background of the LD theory.
Consider an i.i.d. sequence X
1
,...,X
N
of replications of a random variable X,
and let Z
N
N
1
N
i=1
X
i
be the corresponding sample average. Then for any real
numbers a and t 0wehavethatP (Z
N
a)=P (e
tZ
N
e
ta
), and hence, by
Chebyshev’s inequality
P (Z
N
a) e
ta
IE
e
tZ
N
= e
ta
[M(t/N )]
N
where M(t) IE{ e
tX
} is the moment generating function of X.Bytakingthe
logarithm of both sides of the above inequality, changing variables t
t/N and
minimizing over t
> 0, we obtain
1
N
log [P (Z
N
a)] ≤−I(a),(2.4)
where
I(z) sup
t0
{tz Λ(t)}
is the conjugate of the logarithmic moment generating function Λ(t) log M (t). In
LD theory, I(z) is called the large deviations rate function, and the inequality (2.4)
corresponds to the upper bound of Cram´er’s LD theorem.
Although we do not need this in the subsequent analysis, it could be mentioned
that the constant I(a) in (2.4) gives, in a sense, the best possible exponential rate
at which the probability P (Z
N
a) converges to zero. This follows from the corre-
sponding lower bound of Cram´er’s LD theorem. For a thorough discussion of the LD
theory, an interested reader is referred to Dembo and Zeitouni [4].

4 A. J. KLEYWEGT AND A. SHAPIRO
The rate function I(z) has the following properties. Suppose that the random
variable X has mean µ. Then the function I(z) is convex, attains its minimum at
z = µ,andI(µ) = 0. Moreover, suppose that the moment generating function
M(t), of X, is finite valued for all t in a neighborhood of t = 0. Then it follows
by the dominated convergence theorem that M(t), and hence the function Λ(t), are
infinitely differentiable at t =0,an
(0) = µ. Consequently for a>µthe derivative
of ψ(t) ta Λ(t)att = 0 is greater than zero, and hence ψ(t) > 0fort>0small
enough. It follows that in that case I(a) > 0.
Now we return to the problems (1.1) and (2.1). Consider ε 0 and the numbers
δ
N
and α(ε) defined in (2.2) and (2.3), respectively. Then it holds that if δ
N
(ε)/2,
then {
ˆ
S
ε
N
⊂S
ε
}. Since the complement of the event {δ
N
(ε)/2} is given by the
union of the events |ˆg
N
(x) g(x)|≥α(ε)/2overallx ∈S, and the probability of that
union is less than or equal to the sum of the corresponding probabilities, it follows
that
1 P
ˆ
S
ε
N
⊂S
ε
x∈S
P {|ˆg
N
(x) g(x)|≥α(ε)/2}.
We make the following assumption.
Assumption A For any x ∈S, the moment generating function M(t) of the random
variable G(x, W ) is finite valued in a neighborhood of t =0.
Under Assumption A, it follows from the LD upper bound (2.4) that for any
x ∈Sthere are constants γ
x
> 0andγ
x
> 0 such that
P {|ˆg
N
(x) g(x)|≥α(ε)/2}≤e
x
+ e
x
.
Namely, the constants γ
x
and γ
x
are given by values of the rate functions of G(x, W )
and G(x, W )atg(x)+α(ε)/2andg(x)+α(ε)/2, respectively. Since the set S is
finite, by taking γ min
x∈S
{γ
x
x
}, the following result is obtained (it is similar to
an asymptotic result for piecewise linear continuous problems derived in [12]).
Proposition 2.2. Suppose that Assumption A holds. Then there exists a con-
stant γ>0 such that the following inequality holds:
lim sup
N→∞
1
N
log
1 P (
ˆ
S
ε
N
⊂S
ε
)
≤−γ.(2.5)
The inequality (2.5) means that the probability of the event {
ˆ
S
ε
N
⊂S
ε
} approaches
one exponentially fast as N →∞. Unfortunately it appears that the corresponding
constant γ, giving the exponential rate of convergence, cannot be calculated (or even
estimated) a priori, i.e., before the problem is solved. Therefore the above result is
more of theoretical value. Let us mention at this point that the above constant γ
depends, through the corresponding rate functions, on the number α(ε). Clearly, if
α(ε) is “small”, then an accurate approximation would be required in order to find
an ε-optimal solution of the true problem. Therefore, in a sense, α(ε) characterizes a
well conditioning of the set S
ε
.
Next we discuss the asymptotics of the SAA optimal objective value ˆv
N
.For
any subset S
of S the inequality ˆv
N
min
x∈S
ˆg
N
(x) holds. In particular, by taking
S
= S
we obtain that ˆv
N
min
x∈S
ˆg
N
(x), and hence
IEv
N
] IE
min
x∈S
ˆg
N
(x)
min
x∈S
IEg
N
(x)] = v
.

SAMPLE AVERAGE APPROXIMATION 5
That is, the estimator ˆv
N
has a negative bias (cf. Mak, Morton, and Wood [6]).
It follows from Proposition 2.1 that w.p.1, for N sufficiently large, the set
ˆ
S
N
of
optimal solutions of the SAA problem is included in S
.Inthatcasewehavethat
ˆv
N
=min
x
ˆ
S
N
ˆg
N
(x) min
x∈S
ˆg
N
(x).
Since the opposite inequality always holds, it follows that w.p.1, ˆv
N
min
x∈S
ˆg
N
(x)=
0forN large enough. Multiplying both sides of this equation by
N we obtain that
w.p.1,
N v
N
min
x∈S
ˆg
N
(x)] = 0 for N large enough, and hence
lim
N→∞
N
ˆv
N
min
x∈S
ˆg
N
(x)
=0 w.p.1.(2.6)
Since convergence w.p.1 implies convergence in probability, it follows from (2.6) that
N v
N
min
x∈S
ˆg
N
(x)] converges in probability to zero, i.e.,
ˆv
N
=min
x∈S
ˆg
N
(x)+o
p
(N
1/2
).
Furthermore, since v
= g(x) for any x ∈S
, it follows that
N
min
x∈S
ˆg
N
(x) v
=
N min
x∈S
g
N
(x) v
]= min
x∈S
N g
N
(x) g(x)]
.
Suppose that for every x ∈S,thevariance
σ
2
(x) Var{ G(x, W )}(2.7)
exists. Then it follows by the Central Limit Theorem (CLT) that, for any x ∈S,
Ng
N
(x) g(x)] converges in distribution to a normally distributed variable Y (x)
with zero mean and variance σ
2
(x). Moreover, again by the CLT, random variables
Y (x) have the same autocovariance function as G(x, W ), i.e., the covariance between
Y (x)andY (x
) is equal to the covariance between G(x, W )andG(x
,W) for any
x, x
∈S. Hence the following result is obtained (it is similar to an asymptotic result
for stochastic programs with continuous decision variables which was derived in [11]).
We use to denote convergence in distribution.
Proposition 2.3. Supp ose that variances σ
2
(x), defined in (2.7), exist for every
x ∈S
.Then
Nv
N
v
) min
x∈S
Y (x),(2.8)
where Y (x) ar e normally distribute d random variables with zero mean and the auto-
covariance function given by the corresponding autoc o variance function of G(x, W ).
In particular, if S
= {x
} is a singleton, then
Nv
N
v
) N(0
2
(x
)).(2.9)
3. Algorithm Design. In the previous section we established a number of con-
vergence results for the sample average approximation method. The results describe
how the optimal value ˆv
N
and the ε-optimal solutions set
ˆ
S
ε
N
of the SAA problem
converge to their true counterparts v
and S
ε
respectively, as the sample size N in-
creases. These results provide some theoretical justification for the proposed method.
When designing an algorithm for solving stochastic discrete optimization problems,
many additional issues have to be addressed. Some of these issues are discussed in
this section.

Citations
More filters
Book

Lectures on Stochastic Programming: Modeling and Theory

TL;DR: The authors dedicate this book to Julia, Benjamin, Daniel, Natan and Yael; to Tsonka, Konstatin and Marek; and to the Memory of Feliks, Maria, and Dentcho.
Journal ArticleDOI

Robust Stochastic Approximation Approach to Stochastic Programming

TL;DR: It is intended to demonstrate that a properly modified SA approach can be competitive and even significantly outperform the SAA method for a certain class of convex stochastic problems.
Journal ArticleDOI

Convex Approximations of Chance Constrained Programs

TL;DR: A large deviation-type approximation, referred to as “Bernstein approximation,” of the chance constrained problem is built that is convex and efficiently solvable and extended to the case of ambiguous chance constrained problems, where the random perturbations are independent with the collection of distributions known to belong to a given convex compact set.
Journal ArticleDOI

A stochastic programming approach for supply chain network design under uncertainty

TL;DR: This paper proposes a stochastic programming model and solution algorithm for solving supply chain network design problems of a realistic scale and integrates a recently proposed sampling strategy, the sample average approximation scheme, with an accelerated Benders decomposition algorithm to quickly compute high quality solutions.
Book ChapterDOI

Monte Carlo Sampling Methods

TL;DR: In this article, Monte Carlo sampling methods for solving large scale stochastic programming problems are discussed, where a random sample is generated outside of an optimization procedure, and then the constructed, so-called sample average approximation (SAA), problem is solved by an appropriate deterministic algorithm.
References
More filters
Book

Large Deviations Techniques and Applications

Amir Dembo, +1 more
TL;DR: The LDP for Abstract Empirical Measures and applications-The Finite Dimensional Case and Applications of Empirically Measures LDP are presented.
BookDOI

Introduction to Stochastic Programming

TL;DR: This textbook provides a first course in stochastic programming suitable for students with a basic knowledge of linear programming, elementary analysis, and probability to help students develop an intuition on how to model uncertainty into mathematical problems.
Book

Multiple Comparison Procedures

TL;DR: In this article, a theory of multiple comparison problems is presented, along with a procedure for pairwise and more general comparisons among all treatments among all the treatments in a clinical trial.
Journal ArticleDOI

Introduction to Stochastic Programming

TL;DR: In this paper, an introduction to stochastic programming is presented, which is based on the idea of Stochastic Programming (SPP) and is used in our work.
Journal ArticleDOI

Monte Carlo bounding techniques for determining solution quality in stochastic programs

TL;DR: It is shown that, in expectation, z^*"n is a lower bound on z* and that this bound monotonically improves as n increases, and confidence intervals are constructed on the optimality gap for any candidate solution x@^ to SP.
Frequently Asked Questions (11)
Q1. What are the contributions in "The sample average approximation method for stochastic discrete optimization" ?

In this paper the authors study a Monte Carlo simulation based approach to stochastic discrete optimization problems. The authors discuss convergence rates and stopping rules of this procedure and present a numerical example of the stochastic knapsack problem. 

The fourth component zα(S2N′ (x̂)/N ′ + S2 M /M)1/2 can also be made small withrelatively little computational effort by choosing N ′ and M sufficiently large. 

the bias seems to decrease slower for the instances with more decision variables than for the instances with fewer decision variables. 

It was found that this convergence rate depends on the well-conditioning of the problem, which in turn tends to become poorer with an increase in the number of decision variables. 

As mentioned above, in the second numerical experiment it was noticed that often the optimality gap estimator is large, even if an optimal solution has been found, i.e., v∗−g(x̂) = 0, (which is also a common problem in deterministic discrete optimization). 

The first component g(x̂)− ĝN′ (x̂) can be made small with relatively little computational effort by choosing N ′ sufficiently large. 

It was shown that the probability that a replication of the SAA method produces an optimal solution increases at an exponential rate in the sample size N . 

for the harder instance with 20 decision variables (instance 20D), the optimal solution was not produced in any of the 270 total number of replications (but the second best solution was produced 3 times); for instance 20R1, the optimal solution was first produced after m = 12 replications with sample size N = 150; and for instance 20R5, the optimal solution was first produced after m = 15 replications with sample size N = 50. 

The second component, the true optimality gap v∗ − g(x̂) is often small after only a few replications m with a small sample size N . 

The most noticeable effect is that the bias decreases much slower for the harder instances than for the randomly generated instances as the sample size N increases. 

a more efficient optimality gap estimator can make a substantial contribution toward improving the performance guarantees of the SAA method during execution of the algorithm.