Weighted Means in Stochastic Approximation of Minima
read more
Citations
Implementation of the simultaneous perturbation algorithm for stochastic optimization
Adaptive stochastic approximation by the simultaneous perturbation method
Model-free control of nonlinear stochastic systems with discrete-time measurements
Simulation-Based Algorithms for Markov Decision Processes
Probabilistic and Randomized Methods for Design under Uncertainty
References
Multivariate stochastic approximation using a simultaneous perturbation gradient approximation
Stochastic Estimation of the Maximum of a Regression Function
Acceleration of stochastic approximation by averaging
Stochastic Approximation Methods for Constrained and Unconstrained Systems
Related Papers (5)
Frequently Asked Questions (10)
Q2. What is the recursion of the gradient?
For recursion (1.1) with gradient estimate (3.1), assume that conditions (A)–(D) hold, A := Hf(ϑ) is positive definite, and Xn → ϑ a.s. Let Bn(t) := n−1/2 {∑bntc i=1 Wi + (nt− bntc)Wbntc+1 } .
Q3. What is the proof of the Lemma 7.1?
To show that Bn,1 converges in distribution to a Brownian motion B, and that Bn,2 converges to zero in probability, the authors apply an invariance principle for martingale difference sequences of Berger [1].
Q4. What is the recursion of the gradient estimate?
0. For recursion (1.1) with gradient estimate (4.1), assume that conditions (A), (E), and (F) hold, and that f is bounded from below and has a Lipschitz continuous gradient.
Q5. How can the authors obtain the limit distribution for the two algorithms?
For both algorithms (i) and (ii), with any gradient estimate considered in this paper, the limit distribution can be obtained by Theorem 1 in Walk [24] and by the representations derived in the proofs of Theorems 3.2 and 4.2.
Q6. What is the pth derivative of f?
For the second case (p ≥ 3), the authors assume that there exist ε>0 and L such that (B2a) derivatives of f up to order p− 1 exist on Uε(ϑ), (B2b) the pth derivative of f at ϑ exists, (B2c) ‖Hf(x)−Hf(y)‖ ≤
Q7. What is the proof of Proposition 4.1?
Note that ‖n1/4r(Xn, cn)1Ω(n)‖ ≤ C4 n1/4‖Un‖1+τ + C5 n−τ/4. In both cases the authors get Tn−T = o(1) almost in L1 and An−A = o(1/ √ nan) almost in L2 , where the authors have used 2/p ≤ 1/2+1/(2p) < α for p ≥ 3. Thus the assertion follows from Lemma 7.1 (a).Proof of Proposition 4.1.
Q8. What is the simplest formula to obtain f(xn)?
Taking conditional expectations and using inequalities (6.11) and (6.12), the authors obtainE (f(Xn+1) | Gn) ≤ f(Xn)− an ( ‖∇f(Xn)‖2 −Kcn ‖∇f(Xn)‖ ) +Ka2n ‖∇f(Xn)‖2 +Ka2n/c2n ( E(W 2n | Gn) +
Q9. What is the gradient estimate for a fixed c?
The authors get∀c > 0 1 ≤ min a>β/(2λ0)E(a, c)Ẽ(−1/p, c) < supa>β/(2λ0)E(a, c)Ẽ(−1/p, c) = ∞.(5.4)Assume that a0 (> β/(2λ0)) minimizes E(a, c) for a fixed c.
Q10. What is the difference between the two methods?
At least for second-order polynomials f , the FDSA method needs d times more observations than the SPSA method to achieve the same level of mean squared error asymptotically, when the same span cn = cn−γ is used (Spall [23]).