Implementation of the simultaneous perturbation algorithm for stochastic optimization

doi:10.1109/7.705889

Home
/
Papers
/
Implementation of the simultaneous perturbation algorithm for stochastic optimization

Journal Article•DOI•

Implementation of the simultaneous perturbation algorithm for stochastic optimization

James C. Spall¹•Institutions (1)

Johns Hopkins University¹

01 Jul 1998-IEEE Transactions on Aerospace and Electronic Systems (IEEE)-Vol. 34, Iss: 3, pp 817-823

TL;DR: This paper presents a simple step-by-step guide to implementation of SPSA in generic optimization problems and offers some practical suggestions for choosing certain algorithm coefficients.

read less

Abstract: The need for solving multivariate optimization problems is pervasive in engineering and the physical and social sciences. The simultaneous perturbation stochastic approximation (SPSA) algorithm has recently attracted considerable attention for challenging optimization problems where it is difficult or impossible to directly obtain a gradient of the objective function with respect to the parameters being optimized. SPSA is based on an easily implemented and highly efficient gradient approximation that relies on measurements of the objective function, not on measurements of the gradient of the objective function. The gradient approximation is based on only two function measurements (regardless of the dimension of the gradient vector). This contrasts with standard finite-difference approaches, which require a number of function measurements proportional to the dimension of the gradient vector. This paper presents a simple step-by-step guide to implementation of SPSA in generic optimization problems and offers some practical suggestions for choosing certain algorithm coefficients.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets

[...]

Abhinav Kandala¹, Antonio Mezzacapo¹, Kristan Temme¹, Maika Takita¹, Markus Brink¹, Jerry M. Chow¹, Jay M. Gambetta¹ - Show less +3 more•Institutions (1)

IBM¹

14 Sep 2017-Nature

TL;DR: The experimental optimization of Hamiltonian problems with up to six qubits and more than one hundred Pauli terms is demonstrated, determining the ground-state energy for molecules of increasing size, up to BeH2.

...read moreread less

Abstract: The ground-state energy of small molecules is determined efficiently using six qubits of a superconducting quantum processor. Quantum simulation is currently the most promising application of quantum computers. However, only a few quantum simulations of very small systems have been performed experimentally. Here, researchers from IBM present quantum simulations of larger systems using a variational quantum eigenvalue solver (or eigensolver), a previously suggested method for quantum optimization. They perform quantum chemical calculations of LiH and BeH2 and an energy minimization procedure on a four-qubit Heisenberg model. Their application of the variational quantum eigensolver is hardware-efficient, which means that it is optimized on the given architecture. Noise is a big problem in this implementation, but quantum error correction could eventually help this experimental set-up to yield a quantum simulation of chemically interesting systems on a quantum computer. Quantum computers can be used to address electronic-structure problems and problems in materials science and condensed matter physics that can be formulated as interacting fermionic problems, problems which stretch the limits of existing high-performance computers1. Finding exact solutions to such problems numerically has a computational cost that scales exponentially with the size of the system, and Monte Carlo methods are unsuitable owing to the fermionic sign problem. These limitations of classical computational methods have made solving even few-atom electronic-structure problems interesting for implementation using medium-sized quantum computers. Yet experimental implementations have so far been restricted to molecules involving only hydrogen and helium2,3,4,5,6,7,8. Here we demonstrate the experimental optimization of Hamiltonian problems with up to six qubits and more than one hundred Pauli terms, determining the ground-state energy for molecules of increasing size, up to BeH2. We achieve this result by using a variational quantum eigenvalue solver (eigensolver) with efficiently prepared trial states that are tailored specifically to the interactions that are available in our quantum processor, combined with a compact encoding of fermionic Hamiltonians9 and a robust stochastic optimization routine10. We demonstrate the flexibility of our approach by applying it to a problem of quantum magnetism, an antiferromagnetic Heisenberg model in an external magnetic field. In all cases, we find agreement between our experiments and numerical simulations using a model of the device with noise. Our results help to elucidate the requirements for scaling the method to larger systems and for bridging the gap between key problems in high-performance computing and their implementation on quantum hardware.

...read moreread less

2,348 citations

Cites background from "Implementation of the simultaneous ..."

...101} [20], ensuring the smoothest descent along the approximate gradients defined in Eq....
[...]
...Their utility spans from combinatorial optimization problems [17, 18] to quantum chemistry in the form of variational quantum eigensolvers (VQEs), where they were introduced to reduce coherence requirements on quantum hardware [4, 19, 20]....
[...]

Journal Article•DOI•

Recent approaches to global optimization problems through Particle Swarm Optimization

[...]

Konstantinos E. Parsopoulos¹, Michael N. Vrahatis¹•Institutions (1)

University of Patras¹

01 Jun 2002-Natural Computing

TL;DR: A Composite PSO, in which the heuristic parameters of PSO are controlled by a Differential Evolution algorithm during the optimization, is described, and results for many well-known and widely used test functions are given.

...read moreread less

Abstract: This paper presents an overview of our most recent results concerning the Particle Swarm Optimization (PSO) method. Techniques for the alleviation of local minima, and for detecting multiple minimizers are described. Moreover, results on the ability of the PSO in tackling Multiobjective, Minimax, Integer Programming and e1 errors-in-variables problems, as well as problems in noisy and continuously changing environments, are reported. Finally, a Composite PSO, in which the heuristic parameters of PSO are controlled by a Differential Evolution algorithm during the optimization, is described, and results for many well-known and widely used test functions are given.

...read moreread less

1,436 citations

Cites methods from "Implementation of the simultaneous ..."

...…by means of finite differencing, (5) the simultaneous perturbation stochastic approximation algorithm due to Spall (Spall, 1992; Spall, 1998a; Spall, 1998b), (6) the evolutionary gradient search algorithm of Salomon (Salomon, 1998), (7) the evolution strategy with cumulative mutation…...
[...]
...…objective functions by means of finite differencing, (5) the simultaneous perturbation stochastic approximation algorithm due to Spall (Spall, 1992; Spall, 1998a; Spall, 1998b), (6) the evolutionary gradient search algorithm of Salomon (Salomon, 1998), (7) the evolution strategy with cumulative…...
[...]

Journal Article•DOI•

Recent progress on reservoir history matching: a review

[...]

Dean S. Oliver¹, Yan Chen²•Institutions (2)

University of Bergen¹, Chevron Corporation²

01 Jan 2011-Computational Geosciences

TL;DR: This review paper will summarize key developments in history matching and then review many of the accomplishments of the past decade, including developments in reparameterization of the model variables, methods for computation of the sensitivity coefficients, and methods for quantifying uncertainty.

...read moreread less

Abstract: History matching is a type of inverse problem in which observed reservoir behavior is used to estimate reservoir model variables that caused the behavior. Obtaining even a single history-matched reservoir model requires a substantial amount of effort, but the past decade has seen remarkable progress in the ability to generate reservoir simulation models that match large amounts of production data. Progress can be partially attributed to an increase in computational power, but the widespread adoption of geostatistics and Monte Carlo methods has also contributed indirectly. In this review paper, we will summarize key developments in history matching and then review many of the accomplishments of the past decade, including developments in reparameterization of the model variables, methods for computation of the sensitivity coefficients, and methods for quantifying uncertainty. An attempt has been made to compare representative procedures and to identify possible limitations of each.

...read moreread less

726 citations

Journal Article•DOI•

Global Optimization of Stochastic Black-Box Systems via Sequential Kriging Meta-Models

[...]

Deng Huang, Theodore T. Allen¹, W. I. Notz¹, N. Zeng¹•Institutions (1)

Ohio State University¹

01 Mar 2006-Journal of Global Optimization

TL;DR: The proposed method is based on a kriging meta-model that provides a global prediction of the objective values and a measure of prediction uncertainty at every point and has excellent consistency and efficiency in finding global optimal solutions.

...read moreread less

Abstract: This paper proposes a new method that extends the efficient global optimization to address stochastic black-box systems. The method is based on a kriging meta-model that provides a global prediction of the objective values and a measure of prediction uncertainty at every point. The criterion for the infill sample selection is an augmented expected improvement function with desirable properties for stochastic responses. The method is empirically compared with the revised simplex search, the simultaneous perturbation stochastic approximation, and the DIRECT methods using six test problems from the literature. An application case study on an inventory system is also documented. The results suggest that the proposed method has excellent consistency and efficiency in finding global optimal solutions, and is particularly useful for expensive systems.

...read moreread less

632 citations

Journal Article•DOI•

Evaluation of Optimization Methods for Nonrigid Medical Image Registration Using Mutual Information and B-Splines

[...]

Stefan Klein, Marius Staring¹, Josien P. W. Pluim¹•Institutions (1)

Utrecht University¹

01 Dec 2007-IEEE Transactions on Image Processing

TL;DR: This work compares the performance of eight optimization methods: gradient descent, quasi-Newton, nonlinear conjugate gradient, Kiefer-Wolfowitz, simultaneous perturbation, Robbins-Monro, and evolution strategy, and shows that the Robbins- Monro method is the best choice in most applications.

...read moreread less

Abstract: A popular technique for nonrigid registration of medical images is based on the maximization of their mutual information, in combination with a deformation field parameterized by cubic B-splines. The coordinate mapping that relates the two images is found using an iterative optimization procedure. This work compares the performance of eight optimization methods: gradient descent (with two different step size selection algorithms), quasi-Newton, nonlinear conjugate gradient, Kiefer-Wolfowitz, simultaneous perturbation, Robbins-Monro, and evolution strategy. Special attention is paid to computation time reduction by using fewer voxels to calculate the cost function and its derivatives. The optimization methods are tested on manually deformed CT images of the heart, on follow-up CT chest scans, and on MR scans of the prostate acquired using a BFFE, Tl, and T2 protocol. Registration accuracy is assessed by computing the overlap of segmented edges. Precision and convergence properties are studied by comparing deformation fields. The results show that the Robbins-Monro method is the best choice in most applications. With this approach, the computation time per iteration can be lowered approximately 500 times without affecting the rate of convergence by using a small subset of the image, randomly selected in every iteration, to compute the derivative of the mutual information. From the other methods the quasi-Newton and the nonlinear conjugate gradient method achieve a slightly higher precision, at the price of larger computation times.

...read moreread less

460 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

A simplex method for function minimization

[...]

John A. Nelder, R. Mead¹•Institutions (1)

University of Warwick¹

01 Jan 1965-The Computer Journal

TL;DR: A method is described for the minimization of a function of n variables, which depends on the comparison of function values at the (n 41) vertices of a general simplex, followed by the replacement of the vertex with the highest value by another point.

...read moreread less

Abstract: A method is described for the minimization of a function of n variables, which depends on the comparison of function values at the (n 41) vertices of a general simplex, followed by the replacement of the vertex with the highest value by another point. The simplex adapts itself to the local landscape, and contracts on to the final minimum. The method is shown to be effective and computationally compact. A procedure is given for the estimation of the Hessian matrix in the neighbourhood of the minimum, needed in statistical estimation problems.

...read moreread less

27,271 citations

Journal Article•DOI•

Multivariate stochastic approximation using a simultaneous perturbation gradient approximation

[...]

James C. Spall¹•Institutions (1)

Johns Hopkins University¹

01 Mar 1992-IEEE Transactions on Automatic Control

TL;DR: The paper presents an SA algorithm that is based on a simultaneous perturbation gradient approximation instead of the standard finite-difference approximation of Keifer-Wolfowitz type procedures that can be significantly more efficient than the standard algorithms in large-dimensional problems.

...read moreread less

Abstract: The problem of finding a root of the multivariate gradient equation that arises in function minimization is considered. When only noisy measurements of the function are available, a stochastic approximation (SA) algorithm for the general Kiefer-Wolfowitz type is appropriate for estimating the root. The paper presents an SA algorithm that is based on a simultaneous perturbation gradient approximation instead of the standard finite-difference approximation of Keifer-Wolfowitz type procedures. Theory and numerical experience indicate that the algorithm can be significantly more efficient than the standard algorithms in large-dimensional problems. >

...read moreread less

2,149 citations

Journal Article•DOI•

Stochastic Estimation of the Maximum of a Regression Function

[...]

J. Kiefer, Jacob Wolfowitz

01 Sep 1952-Annals of Mathematical Statistics

TL;DR: In this article, the authors give a scheme whereby, starting from an arbitrary point, one obtains successively $x_2, x_3, \cdots$ such that the regression function converges to the unknown point in probability as n \rightarrow \infty.

...read moreread less

Abstract: Let $M(x)$ be a regression function which has a maximum at the unknown point $\theta. M(x)$ is itself unknown to the statistician who, however, can take observations at any level $x$. This paper gives a scheme whereby, starting from an arbitrary point $x_1$, one obtains successively $x_2, x_3, \cdots$ such that $x_n$ converges to $\theta$ in probability as $n \rightarrow \infty$.

...read moreread less

2,141 citations

Book•

Stochastic approximation

[...]

M. T. Wasan

01 Jan 1969

699 citations

Journal Article•DOI•

Some Asymptotic Results for Learning in Single Hidden-Layer Feedforward Network Models

[...]

Halbert White

01 Dec 1989-Journal of the American Statistical Association

TL;DR: In this article, the authors investigate the properties of a recursive estimation procedure (the method of "back-propagation") for a class of nonlinear regression models (single hidden-layer feedforward network models) recently developed by cognitive scientists.

...read moreread less

Abstract: We investigate the properties of a recursive estimation procedure (the method of “back-propagation”) for a class of nonlinear regression models (single hidden-layer feedforward network models) recently developed by cognitive scientists. The results follow from more general results for a class of recursive m estimators, obtained using theorems of Ljung (1977) and Walk (1977) for the method of stochastic approximation. Conditions are given ensuring that the back-propagation estimator converges almost surely to a parameter value that locally minimizes expected squared error loss (provided the estimator does not diverge) and that the back-propagation estimator is asymptotically normal when centered at this minimizer. This estimator is shown to be statistically inefficient, and a two-step procedure that has efficiency equivalent to that of nonlinear least squares is proposed. Practical issues are illustrated by a numerical example involving approximation of the Henon map.

...read moreread less

448 citations