How many decisions can be made with the fourth component?

The fourth component zα(S2N′ (x̂)/N ′ + S2 M /M)1/2 can also be made small withrelatively little computational effort by choosing N ′ and M sufficiently large.

What is the effect of the SAA method on the performance of the problem?

the bias seems to decrease slower for the instances with more decision variables than for the instances with fewer decision variables.

What is the effect of the SAA method on the convergence rate?

It was found that this convergence rate depends on the well-conditioning of the problem, which in turn tends to become poorer with an increase in the number of decision variables.

What is the problem in the second numerical experiment?

As mentioned above, in the second numerical experiment it was noticed that often the optimality gap estimator is large, even if an optimal solution has been found, i.e., v∗−g(x̂) = 0, (which is also a common problem in deterministic discrete optimization).

How can the first component be made small with relatively little computational effort?

The first component g(x̂)− ĝN′ (x̂) can be made small with relatively little computational effort by choosing N ′ sufficiently large.

How much does the probability increase in the sample size?

It was shown that the probability that a replication of the SAA method produces an optimal solution increases at an exponential rate in the sample size N .

How many times did the optimal solution be produced?

for the harder instance with 20 decision variables (instance 20D), the optimal solution was not produced in any of the 270 total number of replications (but the second best solution was produced 3 times); for instance 20R1, the optimal solution was first produced after m = 12 replications with sample size N = 150; and for instance 20R5, the optimal solution was first produced after m = 15 replications with sample size N = 50.

What is the optimality gap in the second numerical experiment?

The second component, the true optimality gap v∗ − g(x̂) is often small after only a few replications m with a small sample size N .

What is the effect of the bias on the hard instances?

The most noticeable effect is that the bias decreases much slower for the harder instances than for the randomly generated instances as the sample size N increases.

What is the effect of the SAA method on the performance of the algorithm?

a more efficient optimality gap estimator can make a substantial contribution toward improving the performance guarantees of the SAA method during execution of the algorithm.

(Open Access) The Sample Average Approximation Method for Stochastic Discrete Optimization (2002) | Anton J. Kleywegt

Q: What are the contributions in "The sample average approximation method for stochastic discrete optimization" ?

In this paper the authors study a Monte Carlo simulation based approach to stochastic discrete optimization problems. The authors discuss convergence rates and stopping rules of this procedure and present a numerical example of the stochastic knapsack problem.

THE SAMPLE AVERAGE APPROXIMATION METHOD FOR

STOCHASTIC DISCRETE OPTIMIZATION

ANTON J. KLEYWEGT

†‡

AND ALEXANDER SHAPIRO

†§

Abstract. In this paper we study a Monte Carlo simulation based approach to stochastic

discrete optimization problems. The basic idea of such methods is that a random sample is generated

and consequently the expected value function is approximated by the corresponding sample average

function. The obtained sample average optimization problem is solved, and the procedure is repeated

several times until a stopping criterion is satisﬁed. We discuss convergence rates and stopping rules

of this procedure and present a numerical example of the stochastic knapsack problem.

Key words. Stochastic programming, discrete optimization, Monte Carlo sampling, Law of

Large Numbers, Large Deviations theory, sample average approximation, stopping rules, stochastic

knapsack problem

AMS subject classiﬁcations. 90C10, 90C15

1. Introduction. In this paper we consider optimization problems of the form

min

x∈S

{g(x) ≡ IE

G(x, W )}.(1.1)

Here W is a random vector having probability distribution P , G(x, w)isarealvalued

function, and S is a ﬁnite set, for example S can be a ﬁnite subset of IR

with

integer coordinates. We assume that the expected value function g(x) is well deﬁned,

i.e. for every x ∈Sthe function G(x, ·)isP-measurable and IE

{|G(x, W )|} < ∞.

We are particularly interested in problems for which the expected value function

g(x) ≡ IE

G(x, W ) cannot be written in a closed form and/or its values cannot be

easily calculated, while G(x, w) is easily computable for given x and w.

It is well known that many discrete optimization problems are hard to solve. Here

on top of this we have additional diﬃculties since the objective function g(x)canbe

complicated and/or diﬃcult to compute even approximately. Therefore stochastic

discrete optimization problems are diﬃcult indeed and little progress in solving such

problems numerically has been reported so far. A discussion of two stage stochastic

integer programming problems with recourse can be found in Birge and Louveaux [2].

A branch and bound approach to solving stochastic integer programming problems

was suggested by Norkin, Pﬂug and Ruszczynski [9]. Schultz, Stougie, and Van der

Vlerk [10] suggested an algebraic approach to solving stochastic programs with integer

recourse by using a framework of Gr¨obner basis reductions.

In this paper we study a Monte Carlo simulation based approach to stochastic

discrete optimization problems. The basic idea is simple indeed—a random sample

of W is generated and consequently the expected value function is approximated by

the corresponding sample average function. The obtained sample average optimiza-

tion problem is solved, and the procedure is repeated several times until a stopping

criterion is satisﬁed. The idea of using sample average approximations for solving

stochastic programs is a natural one and was used by various authors over the years.

Such an approach was used in the context of a stochastic knapsack problem in a recent

paper of Morton and Wood [7].

†

School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA

30332-0205.

‡

Supported by the National Science Foundation under grant DMI-9875400.

Supported by the National Science Foundation under grant DMI-9713878.

2 A. J. KLEYWEGT AND A. SHAPIRO

The organization of this paper is as follows. In the next section we discuss a sta-

tistical inference of the sample average approximation method. In particular we show

that with probability approaching one exponentially fast with increase of the sample

size, an optimal solution of the sample average approximation problem provides an

exact optimal solution of the “true” problem (1.1). In section 3 we outline an algo-

rithm design for the sample average approximation approach to solving (1.1), and in

particular we discuss various stopping rules. In section 4 we present a numerical ex-

ample of the sample average approximation method applied to a stochastic knapsack

problem, and section 5 gives conclusions.

2. Convergence Results. As mentioned in the introduction, we are interested

in solving stochastic discrete optimization problems of the form (1.1). Let W

, ..., W

be an i.i.d. random sample of N realizations of the random vector W . Consider the

sample average function

ˆg

(x) ≡



n=1

G(x, W

)

and the associated problem

min

x∈S

ˆg

(x).(2.1)

We refer to (1.1) and (2.1) as the “true” (or expected value) and sample average

approximation (SAA) problems, respectively. Note that IE[ˆg

(x)] = g(x).

Since the feasible set S is ﬁnite, problems (1.1) and (2.1) have nonempty sets of

optimal solutions, denoted S

∗

and

, respectively. Let v

∗

and ˆv

denote the optimal

values,

∗

≡ min

x∈S

g(x)andˆv

≡ min

x∈S

ˆg

(x)

of the respective problems. We also consider sets of ε-optimal solutions. That is, for

ε ≥ 0, we say that ¯x is an ε-optimal solution of (1.1) if ¯x ∈Sand g(¯x) ≤ v

∗

+ ε.The

sets of all ε-optimal solutions of (1.1) and (2.1) are denoted by S

and

, respectively.

Clearly for ε =0,setS

coincides with S

∗

,and

coincides with

2.1. Convergence of Objective Values and Solutions. In the following

proposition we show convergence with probability one (w.p.1) of the above statis-

tical estimators. By the statement “an event happens w.p.1 for N large enough” we

mean that for P -almost every realization ω = {W

,...} of the random sequence,

there exists an integer N(ω) such that the considered event happens for all samples

,...,W

} from ω with n ≥ N(ω). Note that in such a statement the integer

N(ω) depends on the sequence ω of realizations and therefore is random.

Proposition 2.1. The following two properties hold: (i) ˆv

→ v

∗

w.p.1 as

N →∞,and(ii) for any ε ≥ 0, the event {

⊂S

} happens w.p.1 for N large

enough.

Proof. By the strong Law of Large Numbers we have that for any x ∈S,ˆg

(x)

converges to g(x) w.p.1 as N →∞. Since the set S is ﬁnite, and the union of a ﬁnite

number of sets each of measure zero also has measure zero, it follows that w.p.1, ˆg

(x)

converges to g(x) uniformly in x ∈S. That is, w.p.1,

≡ max

x∈S

|ˆg

(x) − g(x)|→0asN →∞.(2.2)

SAMPLE AVERAGE APPROXIMATION 3

Since |ˆv

− v

∗

|≤δ

, it follows that w.p.1, ˆv

→ v

∗

as N →∞.

For a given ε ≥ 0 consider the number

α(ε) ≡ min

x∈S\S

g(x) − v

∗

− ε.(2.3)

Since for any x ∈S\S

it holds that g(x) >v

∗

+ ε and the set S is ﬁnite, it follows

that α(ε) > 0.

Let N be large enough such that δ

<α(ε)/2. Then ˆv

∗

+ α(ε)/2, and for

any x ∈S\S

it holds that ˆg

(x) >v

∗

+ ε + α(ε)/2. It follows that if x ∈S\S

,then

ˆg

(x) > ˆv

+ ε and hence x does not belong to the set

. The inclusion

⊂S

follows, which completes the proof.

It follows that if, for some ε ≥ 0, S

= {x

∗

} is a singleton, then w.p.1,

= {x

∗

}

for N large enough. In particular, if the true problem (1.1) has a unique optimal

solution x

∗

, then w.p.1, for suﬃciently large N the approximating problem (2.1) has

a unique optimal solution ˆx

and ˆx

= x

∗

In the next section, and in section4,itisdemonstratedthatα(ε), deﬁned in (2.3),

is an important measure of the well-conditioning of a stochastic discrete optimization

problem.

2.2. Convergence Rates. The above results do not say anything about the

rates of convergence of ˆv

and

to their true counterparts. In this section we

investigate such rates of convergence. By using the theory of Large Deviations (LD)

we show that, under mild regularity conditions, the probability of the event {

⊂

} approaches one exponentially fast as N →∞. Next we brieﬂy outline some

background of the LD theory.

Consider an i.i.d. sequence X

,...,X

of replications of a random variable X,

and let Z

≡ N

−1



i=1

be the corresponding sample average. Then for any real

numbers a and t ≥ 0wehavethatP (Z

≥ a)=P (e

≥ e

), and hence, by

Chebyshev’s inequality

P (Z

≥ a) ≤ e

−ta





= e

−ta

[M(t/N )]

where M(t) ≡ IE{ e

} is the moment generating function of X.Bytakingthe

logarithm of both sides of the above inequality, changing variables t



≡ t/N and

minimizing over t



> 0, we obtain

log [P (Z

≥ a)] ≤−I(a),(2.4)

where

I(z) ≡ sup

t≥0

{tz − Λ(t)}

is the conjugate of the logarithmic moment generating function Λ(t) ≡ log M (t). In

LD theory, I(z) is called the large deviations rate function, and the inequality (2.4)

corresponds to the upper bound of Cram´er’s LD theorem.

Although we do not need this in the subsequent analysis, it could be mentioned

that the constant I(a) in (2.4) gives, in a sense, the best possible exponential rate

at which the probability P (Z

≥ a) converges to zero. This follows from the corre-

sponding lower bound of Cram´er’s LD theorem. For a thorough discussion of the LD

theory, an interested reader is referred to Dembo and Zeitouni [4].

4 A. J. KLEYWEGT AND A. SHAPIRO

The rate function I(z) has the following properties. Suppose that the random

variable X has mean µ. Then the function I(z) is convex, attains its minimum at

z = µ,andI(µ) = 0. Moreover, suppose that the moment generating function

M(t), of X, is ﬁnite valued for all t in a neighborhood of t = 0. Then it follows

by the dominated convergence theorem that M(t), and hence the function Λ(t), are

inﬁnitely diﬀerentiable at t =0,andΛ



(0) = µ. Consequently for a>µthe derivative

of ψ(t) ≡ ta − Λ(t)att = 0 is greater than zero, and hence ψ(t) > 0fort>0small

enough. It follows that in that case I(a) > 0.

Now we return to the problems (1.1) and (2.1). Consider ε ≥ 0 and the numbers

and α(ε) deﬁned in (2.2) and (2.3), respectively. Then it holds that if δ

<α(ε)/2,

then {

⊂S

}. Since the complement of the event {δ

<α(ε)/2} is given by the

union of the events |ˆg

(x) −g(x)|≥α(ε)/2overallx ∈S, and the probability of that

union is less than or equal to the sum of the corresponding probabilities, it follows

that

1 − P



⊂S



≤



x∈S

P {|ˆg

(x) − g(x)|≥α(ε)/2}.

We make the following assumption.

Assumption A For any x ∈S, the moment generating function M(t) of the random

variable G(x, W ) is ﬁnite valued in a neighborhood of t =0.

Under Assumption A, it follows from the LD upper bound (2.4) that for any

x ∈Sthere are constants γ

> 0andγ



> 0 such that

P {|ˆg

(x) − g(x)|≥α(ε)/2}≤e

−Nγ

+ e

−Nγ



Namely, the constants γ

and γ



are given by values of the rate functions of G(x, W )

and −G(x, W )atg(x)+α(ε)/2and−g(x)+α(ε)/2, respectively. Since the set S is

ﬁnite, by taking γ ≡ min

x∈S

{γ

,γ



}, the following result is obtained (it is similar to

an asymptotic result for piecewise linear continuous problems derived in [12]).

Proposition 2.2. Suppose that Assumption A holds. Then there exists a con-

stant γ>0 such that the following inequality holds:

lim sup

N→∞

log



1 − P (

⊂S

)



≤−γ.(2.5)

The inequality (2.5) means that the probability of the event {

⊂S

} approaches

one exponentially fast as N →∞. Unfortunately it appears that the corresponding

constant γ, giving the exponential rate of convergence, cannot be calculated (or even

estimated) a priori, i.e., before the problem is solved. Therefore the above result is

more of theoretical value. Let us mention at this point that the above constant γ

depends, through the corresponding rate functions, on the number α(ε). Clearly, if

α(ε) is “small”, then an accurate approximation would be required in order to ﬁnd

an ε-optimal solution of the true problem. Therefore, in a sense, α(ε) characterizes a

well conditioning of the set S

Next we discuss the asymptotics of the SAA optimal objective value ˆv

.For

any subset S



of S the inequality ˆv

≤ min

x∈S



ˆg

(x) holds. In particular, by taking



= S

∗

we obtain that ˆv

≤ min

x∈S

∗

ˆg

(x), and hence

IE[ˆv

] ≤ IE



min

x∈S

∗

ˆg

(x)



≤ min

x∈S

∗

IE[ˆg

(x)] = v

∗

SAMPLE AVERAGE APPROXIMATION 5

That is, the estimator ˆv

has a negative bias (cf. Mak, Morton, and Wood [6]).

It follows from Proposition 2.1 that w.p.1, for N suﬃciently large, the set

optimal solutions of the SAA problem is included in S

∗

.Inthatcasewehavethat

ˆv

=min

x∈

ˆg

(x) ≥ min

x∈S

∗

ˆg

(x).

Since the opposite inequality always holds, it follows that w.p.1, ˆv

−min

x∈S

∗

ˆg

(x)=

0forN large enough. Multiplying both sides of this equation by

√

N we obtain that

w.p.1,

√

N [ˆv

− min

x∈S

∗

ˆg

(x)] = 0 for N large enough, and hence

lim

N→∞

√



ˆv

− min

x∈S

∗

ˆg

(x)



=0 w.p.1.(2.6)

Since convergence w.p.1 implies convergence in probability, it follows from (2.6) that

√

N [ˆv

− min

x∈S

∗

ˆg

(x)] converges in probability to zero, i.e.,

ˆv

=min

x∈S

∗

ˆg

(x)+o

−1/2

Furthermore, since v

∗

= g(x) for any x ∈S

∗

, it follows that

√



min

x∈S

∗

ˆg

(x) − v

∗



√

N min

x∈S

∗

[ˆg

(x) − v

∗

]= min

x∈S

∗



√

N [ˆg

(x) − g(x)]



Suppose that for every x ∈S,thevariance

(x) ≡ Var{ G(x, W )}(2.7)

exists. Then it follows by the Central Limit Theorem (CLT) that, for any x ∈S,

√

N[ˆg

(x) − g(x)] converges in distribution to a normally distributed variable Y (x)

with zero mean and variance σ

(x). Moreover, again by the CLT, random variables

Y (x) have the same autocovariance function as G(x, W ), i.e., the covariance between

Y (x)andY (x



) is equal to the covariance between G(x, W )andG(x



,W) for any

x, x



∈S. Hence the following result is obtained (it is similar to an asymptotic result

for stochastic programs with continuous decision variables which was derived in [11]).

We use “⇒” to denote convergence in distribution.

Proposition 2.3. Supp ose that variances σ

(x), deﬁned in (2.7), exist for every

x ∈S

∗

.Then

√

N(ˆv

− v

∗

) ⇒ min

x∈S

∗

Y (x),(2.8)

where Y (x) ar e normally distribute d random variables with zero mean and the auto-

covariance function given by the corresponding autoc o variance function of G(x, W ).

In particular, if S

∗

= {x

∗

} is a singleton, then

√

N(ˆv

− v

∗

) ⇒ N(0,σ

∗

)).(2.9)

3. Algorithm Design. In the previous section we established a number of con-

vergence results for the sample average approximation method. The results describe

how the optimal value ˆv

and the ε-optimal solutions set

of the SAA problem

converge to their true counterparts v

∗

and S

respectively, as the sample size N in-

creases. These results provide some theoretical justiﬁcation for the proposed method.

When designing an algorithm for solving stochastic discrete optimization problems,

many additional issues have to be addressed. Some of these issues are discussed in

this section.

The Sample Average Approximation Method for Stochastic Discrete Optimization

Figures

Citations

Lectures on Stochastic Programming: Modeling and Theory

Robust Stochastic Approximation Approach to Stochastic Programming

Convex Approximations of Chance Constrained Programs

A stochastic programming approach for supply chain network design under uncertainty

Monte Carlo Sampling Methods

References

Large Deviations Techniques and Applications

Introduction to Stochastic Programming

Multiple Comparison Procedures

Introduction to Stochastic Programming

Monte Carlo bounding techniques for determining solution quality in stochastic programs

Related Papers (5)

Introduction to Stochastic Programming

Lectures on Stochastic Programming: Modeling and Theory

L-shaped linear programs with applications to optimal control and stochastic programming.

A Stochastic Approximation Method

Optimization of conditional value-at-risk

Frequently Asked Questions (11)

Q1. What are the contributions in "The sample average approximation method for stochastic discrete optimization" ?

Q2. How many decisions can be made with the fourth component?

Q3. What is the effect of the SAA method on the performance of the problem?

Q4. What is the effect of the SAA method on the convergence rate?

Q5. What is the problem in the second numerical experiment?

Q6. How can the first component be made small with relatively little computational effort?

Q7. How much does the probability increase in the sample size?

Q8. How many times did the optimal solution be produced?

Q9. What is the optimality gap in the second numerical experiment?

Q10. What is the effect of the bias on the hard instances?

Q11. What is the effect of the SAA method on the performance of the algorithm?