scispace - formally typeset
Search or ask a question
Author

James C. Spall

Other affiliations: Johns Hopkins University
Bio: James C. Spall is an academic researcher from Johns Hopkins University Applied Physics Laboratory. The author has contributed to research in topics: Stochastic approximation & Simultaneous perturbation stochastic approximation. The author has an hindex of 36, co-authored 182 publications receiving 8562 citations. Previous affiliations of James C. Spall include Johns Hopkins University.


Papers
More filters
Journal ArticleDOI
TL;DR: The paper presents an SA algorithm that is based on a simultaneous perturbation gradient approximation instead of the standard finite-difference approximation of Keifer-Wolfowitz type procedures that can be significantly more efficient than the standard algorithms in large-dimensional problems.
Abstract: The problem of finding a root of the multivariate gradient equation that arises in function minimization is considered. When only noisy measurements of the function are available, a stochastic approximation (SA) algorithm for the general Kiefer-Wolfowitz type is appropriate for estimating the root. The paper presents an SA algorithm that is based on a simultaneous perturbation gradient approximation instead of the standard finite-difference approximation of Keifer-Wolfowitz type procedures. Theory and numerical experience indicate that the algorithm can be significantly more efficient than the standard algorithms in large-dimensional problems. >

2,149 citations

Book
01 Mar 2003
TL;DR: A survey of the range of topics, in a strong, interdisciplinary format that will appeal to both students and researchers.
Abstract: From the Publisher: * Unique in its survey of the range of topics. * Contains a strong, interdisciplinary format that will appeal to both students and researchers. * Features exercises and web links to software and data sets.

1,349 citations

Journal ArticleDOI
TL;DR: This paper presents a simple step-by-step guide to implementation of SPSA in generic optimization problems and offers some practical suggestions for choosing certain algorithm coefficients.
Abstract: The need for solving multivariate optimization problems is pervasive in engineering and the physical and social sciences. The simultaneous perturbation stochastic approximation (SPSA) algorithm has recently attracted considerable attention for challenging optimization problems where it is difficult or impossible to directly obtain a gradient of the objective function with respect to the parameters being optimized. SPSA is based on an easily implemented and highly efficient gradient approximation that relies on measurements of the objective function, not on measurements of the gradient of the objective function. The gradient approximation is based on only two function measurements (regardless of the dimension of the gradient vector). This contrasts with standard finite-difference approaches, which require a number of function measurements proportional to the dimension of the gradient vector. This paper presents a simple step-by-step guide to implementation of SPSA in generic optimization problems and offers some practical suggestions for choosing certain algorithm coefficients.

759 citations

Journal ArticleDOI
TL;DR: The paper presents a general adaptive SA algorithm that is based on a simple method for estimating the Hessian matrix, while concurrently estimating the primary parameters of interest, based on the "simultaneous perturbation (SP)" idea introduced previously.
Abstract: Stochastic approximation (SA) has long been applied for problems of minimizing loss functions or root finding with noisy input information. As with all stochastic search algorithms, there are adjustable algorithm coefficients that must be specified, and that can have a profound effect on algorithm performance. It is known that choosing these coefficients according to an SA analog of the deterministic Newton-Raphson algorithm provides an optimal or near-optimal form of the algorithm. However, directly determining the required Hessian matrix (or Jacobian matrix for root finding) to achieve this algorithm form has often been difficult or impossible in practice. The paper presents a general adaptive SA algorithm that is based on a simple method for estimating the Hessian matrix, while concurrently estimating the primary parameters of interest. The approach applies in both the gradient-free optimization (Kiefer-Wolfowitz) and root-finding/stochastic gradient-based (Robbins-Monro) settings, and is based on the "simultaneous perturbation (SP)" idea introduced previously. The algorithm requires only a small number of loss function or gradient measurements per iteration-independent of the problem dimension-to adaptively estimate the Hessian and parameters of primary interest. Aside from introducing the adaptive SP approach, the paper presents practical implementation guidance, asymptotic theory, and a nontrivial numerical evaluation. Also included is a discussion and numerical analysis comparing the adaptive SP approach with the iterate-averaging approach to accelerated SA.

426 citations

01 Jan 1999
TL;DR: Simultaneous perturbation stochastic approximation (SPSA) as mentioned in this paper is a widely used method for multivariate optimization problems that requires only two measurements of the objective function regardless of the dimension of the optimization problem.
Abstract: ultivariate stochastic optimization plays a major role in the analysis and control of many engineering systems. In almost all real-world optimization problems, it is necessary to use a mathematical algorithm that iteratively seeks out the solution because an analytical (closed-form) solution is rarely available. In this spirit, the “simultaneous perturbation stochastic approximation (SPSA)” method for difficult multivariate optimization problems has been developed. SPSA has recently attracted considerable international attention in areas such as statistical parameter estimation, feedback control, simulation-based optimization, signal and image processing, and experimental design. The essential feature of SPSA—which accounts for its power and relative ease of implementation—is the underlying gradient approximation that requires only two measurements of the objective function regardless of the dimension of the optimization problem. This feature allows for a significant decrease in the cost of optimization, especially in problems with a large number of variables to be optimized. (

378 citations


Cited by
More filters
Journal Article
TL;DR: This book by a teacher of statistics (as well as a consultant for "experimenters") is a comprehensive study of the philosophical background for the statistical design of experiment.
Abstract: THE DESIGN AND ANALYSIS OF EXPERIMENTS. By Oscar Kempthorne. New York, John Wiley and Sons, Inc., 1952. 631 pp. $8.50. This book by a teacher of statistics (as well as a consultant for \"experimenters\") is a comprehensive study of the philosophical background for the statistical design of experiment. It is necessary to have some facility with algebraic notation and manipulation to be able to use the volume intelligently. The problems are presented from the theoretical point of view, without such practical examples as would be helpful for those not acquainted with mathematics. The mathematical justification for the techniques is given. As a somewhat advanced treatment of the design and analysis of experiments, this volume will be interesting and helpful for many who approach statistics theoretically as well as practically. With emphasis on the \"why,\" and with description given broadly, the author relates the subject matter to the general theory of statistics and to the general problem of experimental inference. MARGARET J. ROBERTSON

13,333 citations

Journal Article
TL;DR: This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid, and shows that random search is a natural baseline against which to judge progress in the development of adaptive (sequential) hyper- parameter optimization algorithms.
Abstract: Grid search and manual search are the most widely used strategies for hyper-parameter optimization. This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid. Empirical evidence comes from a comparison with a large previous study that used grid search and manual search to configure neural networks and deep belief networks. Compared with neural networks configured by a pure grid search, we find that random search over the same domain is able to find models that are as good or better within a small fraction of the computation time. Granting random search the same computational budget, random search finds better models by effectively searching a larger, less promising configuration space. Compared with deep belief networks configured by a thoughtful combination of manual search and grid search, purely random search over the same 32-dimensional configuration space found statistically equal performance on four of seven data sets, and superior performance on one of seven. A Gaussian process analysis of the function from hyper-parameters to validation set performance reveals that for most data sets only a few of the hyper-parameters really matter, but that different hyper-parameters are important on different data sets. This phenomenon makes grid search a poor choice for configuring algorithms for new data sets. Our analysis casts some light on why recent "High Throughput" methods achieve surprising success--they appear to search through a large number of hyper-parameters because most hyper-parameters do not matter much. We anticipate that growing interest in large hierarchical models will place an increasing burden on techniques for hyper-parameter optimization; this work shows that random search is a natural baseline against which to judge progress in the development of adaptive (sequential) hyper-parameter optimization algorithms.

6,935 citations

Book ChapterDOI
01 Jan 2011
TL;DR: Weakconvergence methods in metric spaces were studied in this article, with applications sufficient to show their power and utility, and the results of the first three chapters are used in Chapter 4 to derive a variety of limit theorems for dependent sequences of random variables.
Abstract: The author's preface gives an outline: "This book is about weakconvergence methods in metric spaces, with applications sufficient to show their power and utility. The Introduction motivates the definitions and indicates how the theory will yield solutions to problems arising outside it. Chapter 1 sets out the basic general theorems, which are then specialized in Chapter 2 to the space C[0, l ] of continuous functions on the unit interval and in Chapter 3 to the space D [0, 1 ] of functions with discontinuities of the first kind. The results of the first three chapters are used in Chapter 4 to derive a variety of limit theorems for dependent sequences of random variables. " The book develops and expands on Donsker's 1951 and 1952 papers on the invariance principle and empirical distributions. The basic random variables remain real-valued although, of course, measures on C[0, l ] and D[0, l ] are vitally used. Within this framework, there are various possibilities for a different and apparently better treatment of the material. More of the general theory of weak convergence of probabilities on separable metric spaces would be useful. Metrizability of the convergence is not brought up until late in the Appendix. The close relation of the Prokhorov metric and a metric for convergence in probability is (hence) not mentioned (see V. Strassen, Ann. Math. Statist. 36 (1965), 423-439; the reviewer, ibid. 39 (1968), 1563-1572). This relation would illuminate and organize such results as Theorems 4.1, 4.2 and 4.4 which give isolated, ad hoc connections between weak convergence of measures and nearness in probability. In the middle of p. 16, it should be noted that C*(S) consists of signed measures which need only be finitely additive if 5 is not compact. On p. 239, where the author twice speaks of separable subsets having nonmeasurable cardinal, he means "discrete" rather than "separable." Theorem 1.4 is Ulam's theorem that a Borel probability on a complete separable metric space is tight. Theorem 1 of Appendix 3 weakens completeness to topological completeness. After mentioning that probabilities on the rationals are tight, the author says it is an

3,554 citations

Journal ArticleDOI
TL;DR: For instance, mean-field variational inference as discussed by the authors approximates probability densities through optimization, which is used in many applications and tends to be faster than classical methods, such as Markov chain Monte Carlo sampling.
Abstract: One of the core problems of modern statistics is to approximate difficult-to-compute probability densities. This problem is especially important in Bayesian statistics, which frames all inference about unknown quantities as a calculation involving the posterior density. In this article, we review variational inference (VI), a method from machine learning that approximates probability densities through optimization. VI has been used in many applications and tends to be faster than classical methods, such as Markov chain Monte Carlo sampling. The idea behind VI is to first posit a family of densities and then to find a member of that family which is close to the target density. Closeness is measured by Kullback–Leibler divergence. We review the ideas behind mean-field variational inference, discuss the special case of VI applied to exponential family models, present a full example with a Bayesian mixture of Gaussians, and derive a variant that uses stochastic optimization to scale up to massive data...

3,421 citations

Journal ArticleDOI
TL;DR: The SCA algorithm obtains a smooth shape for the airfoil with a very low drag, which demonstrates that this algorithm can highly be effective in solving real problems with constrained and unknown search spaces.
Abstract: This paper proposes a novel population-based optimization algorithm called Sine Cosine Algorithm (SCA) for solving optimization problems. The SCA creates multiple initial random candidate solutions and requires them to fluctuate outwards or towards the best solution using a mathematical model based on sine and cosine functions. Several random and adaptive variables also are integrated to this algorithm to emphasize exploration and exploitation of the search space in different milestones of optimization. The performance of SCA is benchmarked in three test phases. Firstly, a set of well-known test cases including unimodal, multi-modal, and composite functions are employed to test exploration, exploitation, local optima avoidance, and convergence of SCA. Secondly, several performance metrics (search history, trajectory, average fitness of solutions, and the best solution during optimization) are used to qualitatively observe and confirm the performance of SCA on shifted two-dimensional test functions. Finally, the cross-section of an aircraft's wing is optimized by SCA as a real challenging case study to verify and demonstrate the performance of this algorithm in practice. The results of test functions and performance metrics prove that the algorithm proposed is able to explore different regions of a search space, avoid local optima, converge towards the global optimum, and exploit promising regions of a search space during optimization effectively. The SCA algorithm obtains a smooth shape for the airfoil with a very low drag, which demonstrates that this algorithm can highly be effective in solving real problems with constrained and unknown search spaces. Note that the source codes of the SCA algorithm are publicly available at http://www.alimirjalili.com/SCA.html .

3,088 citations