scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Implementation of the simultaneous perturbation algorithm for stochastic optimization

01 Jul 1998-IEEE Transactions on Aerospace and Electronic Systems (IEEE)-Vol. 34, Iss: 3, pp 817-823
TL;DR: This paper presents a simple step-by-step guide to implementation of SPSA in generic optimization problems and offers some practical suggestions for choosing certain algorithm coefficients.
Abstract: The need for solving multivariate optimization problems is pervasive in engineering and the physical and social sciences. The simultaneous perturbation stochastic approximation (SPSA) algorithm has recently attracted considerable attention for challenging optimization problems where it is difficult or impossible to directly obtain a gradient of the objective function with respect to the parameters being optimized. SPSA is based on an easily implemented and highly efficient gradient approximation that relies on measurements of the objective function, not on measurements of the gradient of the objective function. The gradient approximation is based on only two function measurements (regardless of the dimension of the gradient vector). This contrasts with standard finite-difference approaches, which require a number of function measurements proportional to the dimension of the gradient vector. This paper presents a simple step-by-step guide to implementation of SPSA in generic optimization problems and offers some practical suggestions for choosing certain algorithm coefficients.
Citations
More filters
Journal ArticleDOI
14 Sep 2017-Nature
TL;DR: The experimental optimization of Hamiltonian problems with up to six qubits and more than one hundred Pauli terms is demonstrated, determining the ground-state energy for molecules of increasing size, up to BeH2.
Abstract: The ground-state energy of small molecules is determined efficiently using six qubits of a superconducting quantum processor. Quantum simulation is currently the most promising application of quantum computers. However, only a few quantum simulations of very small systems have been performed experimentally. Here, researchers from IBM present quantum simulations of larger systems using a variational quantum eigenvalue solver (or eigensolver), a previously suggested method for quantum optimization. They perform quantum chemical calculations of LiH and BeH2 and an energy minimization procedure on a four-qubit Heisenberg model. Their application of the variational quantum eigensolver is hardware-efficient, which means that it is optimized on the given architecture. Noise is a big problem in this implementation, but quantum error correction could eventually help this experimental set-up to yield a quantum simulation of chemically interesting systems on a quantum computer. Quantum computers can be used to address electronic-structure problems and problems in materials science and condensed matter physics that can be formulated as interacting fermionic problems, problems which stretch the limits of existing high-performance computers1. Finding exact solutions to such problems numerically has a computational cost that scales exponentially with the size of the system, and Monte Carlo methods are unsuitable owing to the fermionic sign problem. These limitations of classical computational methods have made solving even few-atom electronic-structure problems interesting for implementation using medium-sized quantum computers. Yet experimental implementations have so far been restricted to molecules involving only hydrogen and helium2,3,4,5,6,7,8. Here we demonstrate the experimental optimization of Hamiltonian problems with up to six qubits and more than one hundred Pauli terms, determining the ground-state energy for molecules of increasing size, up to BeH2. We achieve this result by using a variational quantum eigenvalue solver (eigensolver) with efficiently prepared trial states that are tailored specifically to the interactions that are available in our quantum processor, combined with a compact encoding of fermionic Hamiltonians9 and a robust stochastic optimization routine10. We demonstrate the flexibility of our approach by applying it to a problem of quantum magnetism, an antiferromagnetic Heisenberg model in an external magnetic field. In all cases, we find agreement between our experiments and numerical simulations using a model of the device with noise. Our results help to elucidate the requirements for scaling the method to larger systems and for bridging the gap between key problems in high-performance computing and their implementation on quantum hardware.

2,348 citations


Cites background from "Implementation of the simultaneous ..."

  • ...101} [20], ensuring the smoothest descent along the approximate gradients defined in Eq....

    [...]

  • ...Their utility spans from combinatorial optimization problems [17, 18] to quantum chemistry in the form of variational quantum eigensolvers (VQEs), where they were introduced to reduce coherence requirements on quantum hardware [4, 19, 20]....

    [...]

Journal ArticleDOI
TL;DR: A Composite PSO, in which the heuristic parameters of PSO are controlled by a Differential Evolution algorithm during the optimization, is described, and results for many well-known and widely used test functions are given.
Abstract: This paper presents an overview of our most recent results concerning the Particle Swarm Optimization (PSO) method. Techniques for the alleviation of local minima, and for detecting multiple minimizers are described. Moreover, results on the ability of the PSO in tackling Multiobjective, Minimax, Integer Programming and e1 errors-in-variables problems, as well as problems in noisy and continuously changing environments, are reported. Finally, a Composite PSO, in which the heuristic parameters of PSO are controlled by a Differential Evolution algorithm during the optimization, is described, and results for many well-known and widely used test functions are given.

1,436 citations


Cites methods from "Implementation of the simultaneous ..."

  • ...…by means of finite differencing, (5) the simultaneous perturbation stochastic approximation algorithm due to Spall (Spall, 1992; Spall, 1998a; Spall, 1998b), (6) the evolutionary gradient search algorithm of Salomon (Salomon, 1998), (7) the evolution strategy with cumulative mutation…...

    [...]

  • ...…objective functions by means of finite differencing, (5) the simultaneous perturbation stochastic approximation algorithm due to Spall (Spall, 1992; Spall, 1998a; Spall, 1998b), (6) the evolutionary gradient search algorithm of Salomon (Salomon, 1998), (7) the evolution strategy with cumulative…...

    [...]

Journal ArticleDOI
TL;DR: This review paper will summarize key developments in history matching and then review many of the accomplishments of the past decade, including developments in reparameterization of the model variables, methods for computation of the sensitivity coefficients, and methods for quantifying uncertainty.
Abstract: History matching is a type of inverse problem in which observed reservoir behavior is used to estimate reservoir model variables that caused the behavior. Obtaining even a single history-matched reservoir model requires a substantial amount of effort, but the past decade has seen remarkable progress in the ability to generate reservoir simulation models that match large amounts of production data. Progress can be partially attributed to an increase in computational power, but the widespread adoption of geostatistics and Monte Carlo methods has also contributed indirectly. In this review paper, we will summarize key developments in history matching and then review many of the accomplishments of the past decade, including developments in reparameterization of the model variables, methods for computation of the sensitivity coefficients, and methods for quantifying uncertainty. An attempt has been made to compare representative procedures and to identify possible limitations of each.

726 citations

Journal ArticleDOI
TL;DR: The proposed method is based on a kriging meta-model that provides a global prediction of the objective values and a measure of prediction uncertainty at every point and has excellent consistency and efficiency in finding global optimal solutions.
Abstract: This paper proposes a new method that extends the efficient global optimization to address stochastic black-box systems. The method is based on a kriging meta-model that provides a global prediction of the objective values and a measure of prediction uncertainty at every point. The criterion for the infill sample selection is an augmented expected improvement function with desirable properties for stochastic responses. The method is empirically compared with the revised simplex search, the simultaneous perturbation stochastic approximation, and the DIRECT methods using six test problems from the literature. An application case study on an inventory system is also documented. The results suggest that the proposed method has excellent consistency and efficiency in finding global optimal solutions, and is particularly useful for expensive systems.

632 citations

Journal ArticleDOI
TL;DR: This work compares the performance of eight optimization methods: gradient descent, quasi-Newton, nonlinear conjugate gradient, Kiefer-Wolfowitz, simultaneous perturbation, Robbins-Monro, and evolution strategy, and shows that the Robbins- Monro method is the best choice in most applications.
Abstract: A popular technique for nonrigid registration of medical images is based on the maximization of their mutual information, in combination with a deformation field parameterized by cubic B-splines. The coordinate mapping that relates the two images is found using an iterative optimization procedure. This work compares the performance of eight optimization methods: gradient descent (with two different step size selection algorithms), quasi-Newton, nonlinear conjugate gradient, Kiefer-Wolfowitz, simultaneous perturbation, Robbins-Monro, and evolution strategy. Special attention is paid to computation time reduction by using fewer voxels to calculate the cost function and its derivatives. The optimization methods are tested on manually deformed CT images of the heart, on follow-up CT chest scans, and on MR scans of the prostate acquired using a BFFE, Tl, and T2 protocol. Registration accuracy is assessed by computing the overlap of segmented edges. Precision and convergence properties are studied by comparing deformation fields. The results show that the Robbins-Monro method is the best choice in most applications. With this approach, the computation time per iteration can be lowered approximately 500 times without affecting the rate of convergence by using a small subset of the image, randomly selected in every iteration, to compute the derivative of the mutual information. From the other methods the quasi-Newton and the nonlinear conjugate gradient method achieve a slightly higher precision, at the price of larger computation times.

460 citations

References
More filters
Journal ArticleDOI
TL;DR: A method is described for the minimization of a function of n variables, which depends on the comparison of function values at the (n 41) vertices of a general simplex, followed by the replacement of the vertex with the highest value by another point.
Abstract: A method is described for the minimization of a function of n variables, which depends on the comparison of function values at the (n 41) vertices of a general simplex, followed by the replacement of the vertex with the highest value by another point. The simplex adapts itself to the local landscape, and contracts on to the final minimum. The method is shown to be effective and computationally compact. A procedure is given for the estimation of the Hessian matrix in the neighbourhood of the minimum, needed in statistical estimation problems.

27,271 citations

Journal ArticleDOI
TL;DR: The paper presents an SA algorithm that is based on a simultaneous perturbation gradient approximation instead of the standard finite-difference approximation of Keifer-Wolfowitz type procedures that can be significantly more efficient than the standard algorithms in large-dimensional problems.
Abstract: The problem of finding a root of the multivariate gradient equation that arises in function minimization is considered. When only noisy measurements of the function are available, a stochastic approximation (SA) algorithm for the general Kiefer-Wolfowitz type is appropriate for estimating the root. The paper presents an SA algorithm that is based on a simultaneous perturbation gradient approximation instead of the standard finite-difference approximation of Keifer-Wolfowitz type procedures. Theory and numerical experience indicate that the algorithm can be significantly more efficient than the standard algorithms in large-dimensional problems. >

2,149 citations

Journal ArticleDOI
TL;DR: In this article, the authors give a scheme whereby, starting from an arbitrary point, one obtains successively $x_2, x_3, \cdots$ such that the regression function converges to the unknown point in probability as n \rightarrow \infty.
Abstract: Let $M(x)$ be a regression function which has a maximum at the unknown point $\theta. M(x)$ is itself unknown to the statistician who, however, can take observations at any level $x$. This paper gives a scheme whereby, starting from an arbitrary point $x_1$, one obtains successively $x_2, x_3, \cdots$ such that $x_n$ converges to $\theta$ in probability as $n \rightarrow \infty$.

2,141 citations

Book
01 Jan 1969

699 citations

Journal ArticleDOI
TL;DR: In this article, the authors investigate the properties of a recursive estimation procedure (the method of "back-propagation") for a class of nonlinear regression models (single hidden-layer feedforward network models) recently developed by cognitive scientists.
Abstract: We investigate the properties of a recursive estimation procedure (the method of “back-propagation”) for a class of nonlinear regression models (single hidden-layer feedforward network models) recently developed by cognitive scientists. The results follow from more general results for a class of recursive m estimators, obtained using theorems of Ljung (1977) and Walk (1977) for the method of stochastic approximation. Conditions are given ensuring that the back-propagation estimator converges almost surely to a parameter value that locally minimizes expected squared error loss (provided the estimator does not diverge) and that the back-propagation estimator is asymptotically normal when centered at this minimizer. This estimator is shown to be statistically inefficient, and a two-step procedure that has efficiency equivalent to that of nonlinear least squares is proposed. Practical issues are illustrated by a numerical example involving approximation of the Henon map.

448 citations