scispace - formally typeset
Search or ask a question
Author

Stephen J. Wright

Bio: Stephen J. Wright is an academic researcher from University of Wisconsin-Madison. The author has contributed to research in topics: Interior point method & Nonlinear programming. The author has an hindex of 61, co-authored 294 publications receiving 46774 citations. Previous affiliations of Stephen J. Wright include Argonne National Laboratory & Birkbeck, University of London.


Papers
More filters
Book
01 Nov 2008
TL;DR: Numerical Optimization presents a comprehensive and up-to-date description of the most effective methods in continuous optimization, responding to the growing interest in optimization in engineering, science, and business by focusing on the methods that are best suited to practical problems.
Abstract: Numerical Optimization presents a comprehensive and up-to-date description of the most effective methods in continuous optimization. It responds to the growing interest in optimization in engineering, science, and business by focusing on the methods that are best suited to practical problems. For this new edition the book has been thoroughly updated throughout. There are new chapters on nonlinear interior methods and derivative-free methods for optimization, both of which are used widely in practice and the focus of much current research. Because of the emphasis on practical methods, as well as the extensive illustrations and exercises, the book is accessible to a wide audience. It can be used as a graduate text in engineering, operations research, mathematics, computer science, and business. It also serves as a handbook for researchers and practitioners in the field. The authors have strived to produce a text that is pleasant to read, informative, and rigorous - one that reveals both the beautiful nature of the discipline and its practical side.

17,420 citations

Journal ArticleDOI
TL;DR: This paper proposes gradient projection algorithms for the bound-constrained quadratic programming (BCQP) formulation of these problems and test variants of this approach that select the line search parameters in different ways, including techniques based on the Barzilai-Borwein method.
Abstract: Many problems in signal processing and statistical inference involve finding sparse solutions to under-determined, or ill-conditioned, linear systems of equations. A standard approach consists in minimizing an objective function which includes a quadratic (squared ) error term combined with a sparseness-inducing regularization term. Basis pursuit, the least absolute shrinkage and selection operator (LASSO), wavelet-based deconvolution, and compressed sensing are a few well-known examples of this approach. This paper proposes gradient projection (GP) algorithms for the bound-constrained quadratic programming (BCQP) formulation of these problems. We test variants of this approach that select the line search parameters in different ways, including techniques based on the Barzilai-Borwein method. Computational experiments show that these GP approaches perform well in a wide range of applications, often being significantly faster (in terms of computation time) than competing methods. Although the performance of GP methods tends to degrade as the regularization term is de-emphasized, we show how they can be embedded in a continuation scheme to recover their efficient practical performance.

3,488 citations

Book
01 Jan 1987
TL;DR: This chapter discusses Primal Method Primal-Dual Methods, Path-Following Algorithm, and Infeasible-Interior-Point Algorithms, and their applications to Linear Programming and Interior-Point Methods.
Abstract: Preface Notation 1. Introduction. Linear Programming Primal-Dual Methods The Central Path A Primal-Dual Framework Path-Following Methods Potential-Reduction Methods Infeasible Starting Points Superlinear Convergence Extensions Mehrotra's Predictor-Corrector Algorithm Linear Algebra Issues Karmarkar's Algorithm 2. Background. Linear Programming and Interior-Point Methods Standard Form Optimality Conditions, Duality, and Solution Sets The B {SYMBOL 200 \f "Symbol"} N Partition and Strict Complementarity A Strictly Interior Point Rank of the Matrix A Bases and Vertices Farkas's Lemma and a Proof of the Goldman-Tucker Result The Central Path Background. Primal Method Primal-Dual Methods. Development of the Fundamental Ideas Notes and References 3. Complexity Theory. Polynomial Versus Exponential, Worst Case vs Average Case Storing the Problem Data. Dimension and Size The Turing Machine and Rational Arithmetic Primal-Dual Methods and Rational Arithmetic Linear Programming and Rational Numbers Moving to a Solution from an Interior Point Complexity of Simplex, Ellipsoid, and Interior-Point Methods Polynomial and Strongly Polynomial Algorithms Beyond the Turing Machine Model More on the Real-Number Model and Algebraic Complexity A General Complexity Theorem for Path-Following Methods Notes and References 4. Potential-Reduction Methods. A Primal-Dual Potential-Reduction Algorithm Reducing Forces Convergence A Quadratic Estimate of \Phi _{\rho } along a Feasible Direction Bounding the Coefficients in The Quadratic Approximation An Estimate of the Reduction in \Phi _{\rho } and Polynomial Complexity What About Centrality? Choosing {SYMBOL 114 \f "Symbol"} and {SYMBOL 97 \f "Symbol"} Notes and References 5. Path-Following Algorithms. The Short-Step Path-Following Algorithm Technical Results The Predictor-Corrector Method A Long-Step Path-Following Algorithm Limit Points of the Iteration Sequence Proof of Lemma 5.3 Notes and References 6. Infeasible-Interior-Point Algorithms. The Algorithm Convergence of Algorithm IPF Technical Results I. Bounds on u _k \delimiter "026B30D (x^k,s^k) \delimiter "026B30D Technical Results II. Bounds on (D^k)^{-1} \Delta x^k and D^k \Delta s^k Technical Results III. A Uniform Lower Bound on {SYMBOL 97 \f "Symbol"}k Proofs of Theorems 6.1 and 6.2 Limit Points of the Iteration Sequence 7. Superlinear Convergence and Finite Termination. Affine-Scaling Steps An Estimate of ({SYMBOL 68 \f "Symbol"}x, {SYMBOL 68 \f "Symbol"} s). The Feasible Case An Estimate of ({SYMBOL 68 \f "Symbol"} x, {SYMBOL 68 \f "Symbol"} s). The Infeasible Case Algorithm PC Is Superlinear Nearly Quadratic Methods Convergence of Algorithm LPF+ Convergence of the Iteration Sequence {SYMBOL 206 \f "Symbol"}(A,b,c) and Finite Termination A Finite Termination Strategy Recovering an Optimal Basis More on {SYMBOL 206 \f "Symbol"} (A,b,c) Notes and References 8. Extensions. The Monotone LCP Mixed and Horizontal LCP Strict Complementarity and LCP Convex QP Convex Programming Monotone Nonlinear Complementarity and Variational Inequalities Semidefinite Programming Proof of Theorem 8.4. Notes and References 9. Detecting Infeasibility. Self-Duality The Simplified HSD Form The HSDl Form Identifying a Solution-Free Region Implementations of the HSD Formulations Notes and References 10. Practical Aspects of Primal-Dual Algorithms. Motivation for Mehrotra's Algorithm The Algorithm Superquadratic Convergence Second-Order Trajectory-Following Methods Higher-Order Methods Further Enhancements Notes and References 11. Implementations. Three Forms of the Step Equation The Cholesky Factorization Sparse Cholesky Factorization. Minimum-Degree Orderings Other Orderings Small Pivots in the Cholesky Factorization Dense Columns in A The Augmented System Formulat

2,277 citations

Book
28 Apr 2000
TL;DR: Numerical optimization presents a graduate text, in continuous presents, that talks extensively about algorithmic performance and thinking, and about mathematical optimization in understanding of initiative.
Abstract: Optimization is an important tool used in decision science and for the analysis of physical systems used in This space but at the book requires substantial background. Numerical optimization presents a graduate text, in continuous presents. Mmor mathematical optimization in understanding of initiative. I acknowledge the necessary to use, optimization it talks extensively about algorithmic performance and thinking. The optimization they delve into, the new ideas progressing through more thoroughly updated throughout.

2,193 citations

Proceedings Article
12 Dec 2011
TL;DR: In this paper, the authors present an update scheme called HOGWILD!, which allows processors access to shared memory with the possibility of overwriting each other's work, which achieves a nearly optimal rate of convergence.
Abstract: Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve state-of-the-art performance on a variety of machine learning tasks. Several researchers have recently proposed schemes to parallelize SGD, but all require performance-destroying memory locking and synchronization. This work aims to show using novel theoretical analysis, algorithms, and implementation that SGD can be implemented without any locking. We present an update scheme called HOGWILD! which allows processors access to shared memory with the possibility of overwriting each other's work. We show that when the associated optimization problem is sparse, meaning most gradient updates only modify small parts of the decision variable, then HOGWILD! achieves a nearly optimal rate of convergence. We demonstrate experimentally that HOGWILD! outperforms alternative schemes that use locking by an order of magnitude.

1,939 citations


Cited by
More filters
Book
18 Nov 2016
TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.
Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

38,208 citations

Journal ArticleDOI
TL;DR: AutoDock Vina achieves an approximately two orders of magnitude speed‐up compared with the molecular docking software previously developed in the lab, while also significantly improving the accuracy of the binding mode predictions, judging by tests on the training set used in AutoDock 4 development.
Abstract: AutoDock Vina, a new program for molecular docking and virtual screening, is presented. AutoDock Vina achieves an approximately two orders of magnitude speed-up compared with the molecular docking software previously developed in our lab (AutoDock 4), while also significantly improving the accuracy of the binding mode predictions, judging by our tests on the training set used in AutoDock 4 development. Further speed-up is achieved from parallelism, by using multithreading on multicore machines. AutoDock Vina automatically calculates the grid maps and clusters the results in a way transparent to the user.

20,059 citations

Book
D.L. Donoho1
01 Jan 2004
TL;DR: It is possible to design n=O(Nlog(m)) nonadaptive measurements allowing reconstruction with accuracy comparable to that attainable with direct knowledge of the N most important coefficients, and a good approximation to those N important coefficients is extracted from the n measurements by solving a linear program-Basis Pursuit in signal processing.
Abstract: Suppose x is an unknown vector in Ropfm (a digital image or signal); we plan to measure n general linear functionals of x and then reconstruct. If x is known to be compressible by transform coding with a known transform, and we reconstruct via the nonlinear procedure defined here, the number of measurements n can be dramatically smaller than the size m. Thus, certain natural classes of images with m pixels need only n=O(m1/4log5/2(m)) nonadaptive nonpixel samples for faithful recovery, as opposed to the usual m pixel samples. More specifically, suppose x has a sparse representation in some orthonormal basis (e.g., wavelet, Fourier) or tight frame (e.g., curvelet, Gabor)-so the coefficients belong to an lscrp ball for 0

18,609 citations

Book
23 May 2011
TL;DR: It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.
Abstract: Many problems of recent interest in statistics and machine learning can be posed in the framework of convex optimization. Due to the explosion in size and complexity of modern datasets, it is increasingly important to be able to solve problems with a very large number of features or training examples. As a result, both the decentralized collection or storage of these datasets as well as accompanying distributed solution methods are either necessary or at least highly desirable. In this review, we argue that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas. The method was developed in the 1970s, with roots in the 1950s, and is equivalent or closely related to many other algorithms, such as dual decomposition, the method of multipliers, Douglas–Rachford splitting, Spingarn's method of partial inverses, Dykstra's alternating projections, Bregman iterative algorithms for l1 problems, proximal methods, and others. After briefly surveying the theory and history of the algorithm, we discuss applications to a wide variety of statistical and machine learning problems of recent interest, including the lasso, sparse logistic regression, basis pursuit, covariance selection, support vector machines, and many others. We also discuss general distributed optimization, extensions to the nonconvex setting, and efficient implementation, including some details on distributed MPI and Hadoop MapReduce implementations.

17,433 citations

Book
01 Nov 2008
TL;DR: Numerical Optimization presents a comprehensive and up-to-date description of the most effective methods in continuous optimization, responding to the growing interest in optimization in engineering, science, and business by focusing on the methods that are best suited to practical problems.
Abstract: Numerical Optimization presents a comprehensive and up-to-date description of the most effective methods in continuous optimization. It responds to the growing interest in optimization in engineering, science, and business by focusing on the methods that are best suited to practical problems. For this new edition the book has been thoroughly updated throughout. There are new chapters on nonlinear interior methods and derivative-free methods for optimization, both of which are used widely in practice and the focus of much current research. Because of the emphasis on practical methods, as well as the extensive illustrations and exercises, the book is accessible to a wide audience. It can be used as a graduate text in engineering, operations research, mathematics, computer science, and business. It also serves as a handbook for researchers and practitioners in the field. The authors have strived to produce a text that is pleasant to read, informative, and rigorous - one that reveals both the beautiful nature of the discipline and its practical side.

17,420 citations