Random Directions Stochastic Approximation With Deterministic Perturbations

doi:10.1109/TAC.2019.2930821

Home
/
Papers
/
Random Directions Stochastic Approximation With Deterministic Perturbations

Journal Article•DOI•

Random Directions Stochastic Approximation With Deterministic Perturbations

L A Prashanth¹, Shalabh Bhatnagar², Nirav Bhavsar¹, Michael C. Fu³, Steven I. Marcus³ - Show less +1 more•Institutions (3)

Indian Institute of Technology Madras¹, Indian Institute of Science², University of Maryland, College Park³

01 Jun 2020-IEEE Transactions on Automatic Control (IEEE)-Vol. 65, Iss: 6, pp 2450-2465

TL;DR: It is shown that the gradient and/or Hessian estimates in the resulting algorithms with DPs are asymptotically unbiased, so that the algorithms are provably convergent and derive convergence rates to establish the superiority of the first-order and second-order algorithms.

read less

Abstract: We introduce deterministic perturbation (DP) schemes for the recently proposed random directions stochastic approximation, and propose new first-order and second-order algorithms. In the latter case, these are the first second-order algorithms to incorporate DPs. We show that the gradient and/or Hessian estimates in the resulting algorithms with DPs are asymptotically unbiased, so that the algorithms are provably convergent. Furthermore, we derive convergence rates to establish the superiority of the first-order and second-order algorithms, for the special case of a convex and quadratic optimization problem, respectively. Numerical experiments are used to validate the theoretical results.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Discrete Event Systems: Sensitivity Analysis and Stochastic Optimization by the Score Function Method

[...]

Charles R. Leake

01 Aug 1994-Journal of the Operational Research Society

196 citations

Proceedings Article•DOI•

Model-Free Primal-Dual Methods for Network Optimization with Application to Real-Time Optimal Power Flow

[...]

Yue Chen¹, Andrey Bernstein¹, Adithya M. Devraj², Sean P. Meyn²•Institutions (2)

National Renewable Energy Laboratory¹, University of Florida²

01 Jul 2020

TL;DR: In this article, the problem of real-time optimization of networked systems and developing online algorithms that steer the system towards the optimal trajectory without explicit knowledge of the system model is examined.

...read moreread less

Abstract: This paper examines the problem of real-time optimization of networked systems and develops online algorithms that steer the system towards the optimal trajectory without explicit knowledge of the system model. The problem is modeled as a dynamic optimization problem with time-varying performance objectives and engineering constraints. The design of the algorithms leverages the online zero-order primal-dual projected-gradient method. In particular, the primal step that involves the gradient of the objective function (and hence requires a networked systems model) is replaced by its zero-order approximation with two function evaluations using a deterministic perturbation signal. The evaluations are performed using the measurements of the system output, hence giving rise to a feedback interconnection, with the optimization algorithm serving as a feedback controller. The paper provides some insights on the stability and tracking properties of this interconnection. Finally, the paper applies this methodology to a real-time optimal power flow problem in power systems, and shows its efficacy on the IEEE 37-node distribution test feeder for reference power tracking and voltage regulation.

...read moreread less

13 citations

Posted Content•

Model-Free Primal-Dual Methods for Network Optimization with Application to Real-Time Optimal Power Flow

[...]

Yue Chen¹, Andrey Bernstein¹, Adithya M. Devraj², Sean P. Meyn²•Institutions (2)

National Renewable Energy Laboratory¹, University of Florida²

28 Sep 2019-arXiv: Optimization and Control

TL;DR: This paper examines the problem of real-time optimization of networked systems and develops online algorithms that steer the system towards the optimal trajectory without explicit knowledge of the system model, and leverages the online zero-order primal-dual projected-gradient method.

...read moreread less

Abstract: This paper examines the problem of real-time optimization of networked systems and develops online algorithms that steer the system towards the optimal trajectory without explicit knowledge of the system model. The problem is modeled as a dynamic optimization problem with time-varying performance objectives and engineering constraints. The design of the algorithms leverages the online zero-order primal-dual projected-gradient method. In particular, the primal step that involves the gradient of the objective function (and hence requires networked systems model) is replaced by its zero-order approximation with two function evaluations using a deterministic perturbation signal. The evaluations are performed using the measurements of the system output, hence giving rise to a feedback interconnection, with the optimization algorithm serving as a feedback controller. The paper provides some insights on the stability and tracking properties of this interconnection. Finally, the paper applies this methodology to a real-time optimal power flow problem in power systems, and shows its efficacy on the IEEE 37-node distribution test feeder for reference power tracking and voltage regulation.

...read moreread less

8 citations

Cites methods from "Random Directions Stochastic Approx..."

...In [10], [11], the SPSA algorithm was extended to deterministic perturbations, to improve convergence rates under the assumption of a vanishing stepsize and vanishing quasi-noise....
[...]

Posted Content•

Non-Asymptotic Bounds for Zeroth-Order Stochastic Optimization.

[...]

Nirav Bhavsar, L A Prashanth

26 Feb 2020

TL;DR: This work introduces an optimization oracle to capture a setting where the function measurements have an estimation error that can be controlled, and provides a guideline for choosing the batch size for estimation, so that the overall error bound matches with the one obtained when there is no estimation error.

...read moreread less

Abstract: We consider the problem of optimizing an objective function with and without convexity in a simulation-optimization context, where only stochastic zeroth-order information is available. We consider two techniques for estimating gradient/Hessian, namely simultaneous perturbation (SP) and Gaussian smoothing (GS). We introduce an optimization oracle to capture a setting where the function measurements have an estimation error that can be controlled. Our oracle is appealing in several practical contexts where the objective has to be estimated from i.i.d. samples, and increasing the number of samples reduces the estimation error. In the stochastic non-convex optimization context, we analyze the zeroth-order variant of the randomized stochastic gradient (RSG) and quasi-Newton (RSQN) algorithms with a biased gradient/Hessian oracle, and with its variant involving an estimation error component. In particular, we provide non-asymptotic bounds on the performance of both algorithms, and our results provide a guideline for choosing the batch size for estimation, so that the overall error bound matches with the one obtained when there is no estimation error. Next, in the stochastic convex optimization setting, we provide non-asymptotic bounds that hold in expectation for the last iterate of a stochastic gradient descent (SGD) algorithm, and our bound for the GS variant of SGD matches the bound for SGD with unbiased gradient information. We perform simulation experiments on synthetic as well as real-world datasets, and the empirical results validate the theoretical findings.

...read moreread less

2 citations

Cites background or methods from "Random Directions Stochastic Approx..."

...RDSA with permutation matrix-based deterministic perturbations (RDSA-Perm-DP) [15] Let y+ m = f(x + ηm∆m) + ξ + m, and y − m = f(x − ηmdm) + ξ− m, where ξ+ m and ξ− m denotes the measurement noise....
[...]
...The reader is referred to Lemma 6 in [15] or Lemma 7....
[...]
...A similar noise structure has been used earlier in the study of SP methods (cf.[15, 17])....
[...]
...0001, see [14] ); and (iv) 1RDSA-Perm-DP and 2RDSA-Perm-DP: This is the recently proposed first- and second-order variant of RDSA, where the perturbations are non-random, and instead use the rows of a permutation matrix [15]....
[...]
...If the function f is three-times continuously differentiable, then the constants c1 and c2 are as follows (see [16, 14, 15]): c1 = α0d 3 and c2 = α1d, where the constant α0 depends on the second moment of the random perturbation employed in the gradient estimate, and a bound on the third derivative of the objective f ....
[...]

Posted Content•

Non-asymptotic bounds for stochastic optimization with biased noisy gradient oracles

[...]

Nirav Bhavsar¹, Prashanth L. A.¹•Institutions (1)

Indian Institute of Technology Madras¹

26 Feb 2020-arXiv: Learning

TL;DR: In this paper, the authors introduce biased gradient oracles to capture a setting where the function measurements have an estimation error that can be controlled through a batch size parameter, and highlight the applicability of their biased gradients in a risk-sensitive reinforcement learning setting.

...read moreread less

Abstract: We introduce biased gradient oracles to capture a setting where the function measurements have an estimation error that can be controlled through a batch size parameter. Our proposed oracles are appealing in several practical contexts, for instance, risk measure estimation from a batch of independent and identically distributed (i.i.d.) samples, or simulation optimization, where the function measurements are `biased' due to computational constraints. In either case, increasing the batch size reduces the estimation error. We highlight the applicability of our biased gradient oracles in a risk-sensitive reinforcement learning setting. In the stochastic non-convex optimization context, we analyze a variant of the randomized stochastic gradient (RSG) algorithm with a biased gradient oracle. We quantify the convergence rate of this algorithm by deriving non-asymptotic bounds on its performance. Next, in the stochastic convex optimization setting, we derive non-asymptotic bounds for the last iterate of a stochastic gradient descent (SGD) algorithm with a biased gradient oracle.

...read moreread less

1 citations

References

PDF

Open Access

More filters

Journal Article•DOI•

A Stochastic Approximation Method

[...]

Herbert Robbins¹, Sutton Monro¹•Institutions (1)

University of North Carolina at Chapel Hill¹

01 Sep 1951-Annals of Mathematical Statistics

TL;DR: In this article, a method for making successive experiments at levels x1, x2, ··· in such a way that xn will tend to θ in probability is presented.

...read moreread less

Abstract: Let M(x) denote the expected value at level x of the response to a certain experiment. M(x) is assumed to be a monotone function of x but is unknown to the experimenter, and it is desired to find the solution x = θ of the equation M(x) = α, where a is a given constant. We give a method for making successive experiments at levels x1, x2, ··· in such a way that xn will tend to θ in probability.

...read moreread less

9,312 citations

"Random Directions Stochastic Approx..." refers background in this paper

...Robbins and Monro [2] developed an incremental-update algorithm that estimates the zeros of the Manuscript received March 30, 2019; accepted July 7, 2019....
[...]
...[22] H. F. Chen, L. Guo, and A. J. Gao, “Convergence and robustness of the Robbins-Monro algorithm truncated at randomly varying bounds,” Stochastic Processes Appl., vol. 27, pp. 217–231, 1987....
[...]
...[2] H. Robbins and S. Monro, “A stochastic approximation method,” Ann....
[...]
...Robbins and Monro [2] developed an incremental-update algorithm that estimates the zeros of the...
[...]

Book•DOI•

Simulation and the Monte Carlo Method

[...]

Reuven Y. Rubinstein¹•Institutions (1)

University of Georgia¹

01 Apr 1981-Technometrics

TL;DR: This book provides the first simultaneous coverage of the statistical aspects of simulation and Monte Carlo methods, their commonalities and their differences for the solution of a wide spectrum of engineering and scientific problems.

...read moreread less

Abstract: From the Publisher: Provides the first simultaneous coverage of the statistical aspects of simulation and Monte Carlo methods, their commonalities and their differences for the solution of a wide spectrum of engineering and scientific problems. Contains standard material usually considered in Monte Carlo simulation as well as new material such as variance reduction techniques, regenerative simulation, and Monte Carlo optimization.

...read moreread less

2,776 citations

Journal Article•DOI•

Multivariate stochastic approximation using a simultaneous perturbation gradient approximation

[...]

James C. Spall¹•Institutions (1)

Johns Hopkins University¹

01 Mar 1992-IEEE Transactions on Automatic Control

TL;DR: The paper presents an SA algorithm that is based on a simultaneous perturbation gradient approximation instead of the standard finite-difference approximation of Keifer-Wolfowitz type procedures that can be significantly more efficient than the standard algorithms in large-dimensional problems.

...read moreread less

Abstract: The problem of finding a root of the multivariate gradient equation that arises in function minimization is considered. When only noisy measurements of the function are available, a stochastic approximation (SA) algorithm for the general Kiefer-Wolfowitz type is appropriate for estimating the root. The paper presents an SA algorithm that is based on a simultaneous perturbation gradient approximation instead of the standard finite-difference approximation of Keifer-Wolfowitz type procedures. Theory and numerical experience indicate that the algorithm can be significantly more efficient than the standard algorithms in large-dimensional problems. >

...read moreread less

2,149 citations

"Random Directions Stochastic Approx..." refers background or methods in this paper

...[10] J. C. Spall, “A one-measurement form of simultaneous perturbation stochastic approximation,” Automatica, vol. 33, no. 1, pp. 109–112, 1997....
[...]
...[9] J. C. Spall, “Multivariate stochastic approximation using a simultaneous perturbation gradient approximation,” IEEE Trans....
[...]
...Spall [13], presented a simultaneous perturbation estimate of the Hessian that was based on four noisy function measurements....
[...]
...In contrast, for the more general case of nonconvex objective f , Chin [7] and Spall [9] are able to establish a rate of O ( n−1/3 ) obtained from an asymptotic mean square error analysis using the second moment of the limiting normal distribution....
[...]
...The abovementioned assumptions are common to the analysis of simultaneous perturbation methods, and can be found, for instance, in the context of 1SPSA [9]—see also [21] for the...
[...]

Journal Article•DOI•

Stochastic Estimation of the Maximum of a Regression Function

[...]

J. Kiefer, Jacob Wolfowitz

01 Sep 1952-Annals of Mathematical Statistics

TL;DR: In this article, the authors give a scheme whereby, starting from an arbitrary point, one obtains successively $x_2, x_3, \cdots$ such that the regression function converges to the unknown point in probability as n \rightarrow \infty.

...read moreread less

Abstract: Let $M(x)$ be a regression function which has a maximum at the unknown point $\theta. M(x)$ is itself unknown to the statistician who, however, can take observations at any level $x$. This paper gives a scheme whereby, starting from an arbitrary point $x_1$, one obtains successively $x_2, x_3, \cdots$ such that $x_n$ converges to $\theta$ in probability as $n \rightarrow \infty$.

...read moreread less

2,141 citations

"Random Directions Stochastic Approx..." refers methods in this paper

...Remark 1: The classic Kiefer–Wolfowitz (K-W) algorithm [3] obtains 2N function samples per iteration, corresponding to parameters xn ± δnei , i = 1, ....
[...]
...Remark 1: The classic Kiefer–Wolfowitz (K-W) algorithm [3] obtains 2N function samples per iteration, corresponding to parameters xn ± δnei , i = 1, . . . , N and updates the parameter as follows: xin+1 = x i n − an ( yi+n − yi−n 2δn ) where yi±n = f(xn ± δnei), i = 1, . . . , N ....
[...]
...The earliest gradient search algorithm in this setting is the Kiefer–Wolfowitz [3] procedure....
[...]

Journal Article•DOI•

Simulation and the Monte Carlo Method.

[...]

Thomas M. F. O'Donovan, Reuven Y. Rubinstein

01 Mar 1983-Biometrics

1,897 citations

"Random Directions Stochastic Approx..." refers background in this paper

...[20] R. Y. Rubinstein and A. Shapiro, Discrete Event Systems: Sensitivity Analysis and Stochastic Optimization by the Score Function Method....
[...]
...Katkovnik and Kulchitsky [4] and Rubinstein [5] proposed a random search technique that became known as the smoothed functional (SF) algorithm....
[...]
...[5] R. Y. Rubinstein, Simulation and the Monte Carlo Method....
[...]