Conditioning of Quasi-Newton Methods for Function Minimization

doi:10.1090/S0025-5718-1970-0274029-X

Home
/
Papers
/
Conditioning of Quasi-Newton Methods for Function Minimization

Journal Article•DOI•

Conditioning of Quasi-Newton Methods for Function Minimization

01 Jul 1970-Mathematics of Computation (American Mathematical Society (AMS))-Vol. 24, Iss: 111, pp 647-656

TL;DR: In this paper, a class of approximating matrices as a function of a scalar parameter is presented, where the problem of optimal conditioning of these matrices under an appropriate norm is investigated and a set of computational results verifies the superiority of the new methods arising from conditioning considerations to known methods.

read less

Abstract: Quasi-Newton methods accelerate the steepest-descent technique for function minimization by using computational history to generate a sequence of approximations to the inverse of the Hessian matrix. This paper presents a class of approximating matrices as a function of a scalar parameter. The problem of optimal conditioning of these matrices under an appropriate norm as a function of the scalar parameter is investigated. A set of computational results verifies the superiority of the new methods arising from conditioning considerations to known methods.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Deep learning in neural networks

[...]

Jürgen Schmidhuber¹•Institutions (1)

University of Lugano¹

01 Jan 2015-Neural Networks

TL;DR: This historical survey compactly summarizes relevant work, much of it from the previous millennium, review deep supervised learning, unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.

...read moreread less

14,635 citations

Journal Article•DOI•

Optimization of parameters for semiempirical methods I. Method

[...]

James J. P. Stewart¹•Institutions (1)

United States Air Force Academy¹

01 Mar 1989-Journal of Computational Chemistry

TL;DR: In this paper, a new method for obtaining optimized parameters for semi-empirical methods has been developed and applied to the modified neglect of diatomic overlap (MNDO) method.

...read moreread less

Abstract: A new method for obtaining optimized parameters for semiempirical methods has been developed and applied to the modified neglect of diatomic overlap (MNDO) method. The method uses derivatives of calculated values for properties with respect to adjustable parameters to obtain the optimized values of parameters. The large increase in speed is a result of using a simple series expression for calculated values of properties rather than employing full semiempirical calculations. With this optimization procedure, the rate-determining step for parameterizing elements changes from the mechanics of parameterization to the assembling of experimental reference data.

...read moreread less

7,125 citations

Journal Article•DOI•

Optimization of parameters for semiempirical methods II. Applications

[...]

James J. P. Stewart¹•Institutions (1)

United States Air Force Academy¹

01 Mar 1989-Journal of Computational Chemistry

TL;DR: In this paper, the average difference between the predicted heats of formation and experimental values for 657 compounds is 7.8 kcal/mol, and for 106 hypervalent compounds, 13.6 kcal/min.

...read moreread less

Abstract: MNDO/AM1‐type parameters for twelve elements have been optimized using a newly developed method for optimizing parameters for semiempirical methods. With the new method, MNDO‐PM3, the average difference between the predicted heats of formation and experimental values for 657 compounds is 7.8 kcal/mol, and for 106 hypervalent compounds, 13.6 kcal/mol. For MNDO the equivalent differences are 13.9 and 75.8 kcal/mol, while those for AM1, in which MNDO parameters are used for aluminum, phosphorus, and sulfur, are 12.7 and 83.1 kcal/mol, respectively. Average errors for ionization potentials, bond angles, and dipole moments are intermediate between those for MNDO and AM1, while errors in bond lengths are slightly reduced.

...read moreread less

3,465 citations

Journal Article•DOI•

Optimization of equilibrium geometries and transition structures

[...]

H. Bernhard Schlegel¹•Institutions (1)

Wayne State University¹

01 Jun 1982-Journal of Computational Chemistry

TL;DR: In this paper, a modified conjugate gradient algorithm for geometry optimization is presented for use with ab initio MO methods, where the second derivative matrix rather than its inverse is updated employing the gradients.

...read moreread less

Abstract: A modified conjugate gradient algorithm for geometry optimization is outlined for use with ab initioMO methods. Since the computation time for analytical energy gradients is approximately the same as for the energy, the optimization algorithm evaluates and utilizes the gradients each time the energy is computed. The second derivative matrix, rather than its inverse, is updated employing the gradients. At each step, a one-dimensional minimization using a quartic polynomial is carried out, followed by an n-dimensional search using the second derivative matrix. By suitably controlling the number of negative eigenvalues of the second derivative matrix, the algorithm can also be used to locate transition structures. Representative timing data for optimizations of equilibrium geometries and transition structures are reported for ab initioSCF–MO calculations.

...read moreread less

3,373 citations

Book•

Engineering Optimization : Theory and Practice

[...]

Singiresu S. Rao

01 Jan 2011

TL;DR: This chapter discusses Optimization Techniques, which are used in Linear Programming I and II, and Nonlinear Programming II, which is concerned with One-Dimensional Minimization.

...read moreread less

Abstract: Preface. 1 Introduction to Optimization. 1.1 Introduction. 1.2 Historical Development. 1.3 Engineering Applications of Optimization. 1.4 Statement of an Optimization Problem. 1.5 Classification of Optimization Problems. 1.6 Optimization Techniques. 1.7 Engineering Optimization Literature. 1.8 Solution of Optimization Problems Using MATLAB. References and Bibliography. Review Questions. Problems. 2 Classical Optimization Techniques. 2.1 Introduction. 2.2 Single-Variable Optimization. 2.3 Multivariable Optimization with No Constraints. 2.4 Multivariable Optimization with Equality Constraints. 2.5 Multivariable Optimization with Inequality Constraints. 2.6 Convex Programming Problem. References and Bibliography. Review Questions. Problems. 3 Linear Programming I: Simplex Method. 3.1 Introduction. 3.2 Applications of Linear Programming. 3.3 Standard Form of a Linear Programming Problem. 3.4 Geometry of Linear Programming Problems. 3.5 Definitions and Theorems. 3.6 Solution of a System of Linear Simultaneous Equations. 3.7 Pivotal Reduction of a General System of Equations. 3.8 Motivation of the Simplex Method. 3.9 Simplex Algorithm. 3.10 Two Phases of the Simplex Method. 3.11 MATLAB Solution of LP Problems. References and Bibliography. Review Questions. Problems. 4 Linear Programming II: Additional Topics and Extensions. 4.1 Introduction. 4.2 Revised Simplex Method. 4.3 Duality in Linear Programming. 4.4 Decomposition Principle. 4.5 Sensitivity or Postoptimality Analysis. 4.6 Transportation Problem. 4.7 Karmarkar's Interior Method. 4.8 Quadratic Programming. 4.9 MATLAB Solutions. References and Bibliography. Review Questions. Problems. 5 Nonlinear Programming I: One-Dimensional Minimization Methods. 5.1 Introduction. 5.2 Unimodal Function. ELIMINATION METHODS. 5.3 Unrestricted Search. 5.4 Exhaustive Search. 5.5 Dichotomous Search. 5.6 Interval Halving Method. 5.7 Fibonacci Method. 5.8 Golden Section Method. 5.9 Comparison of Elimination Methods. INTERPOLATION METHODS. 5.10 Quadratic Interpolation Method. 5.11 Cubic Interpolation Method. 5.12 Direct Root Methods. 5.13 Practical Considerations. 5.14 MATLAB Solution of One-Dimensional Minimization Problems. References and Bibliography. Review Questions. Problems. 6 Nonlinear Programming II: Unconstrained Optimization Techniques. 6.1 Introduction. DIRECT SEARCH METHODS. 6.2 Random Search Methods. 6.3 Grid Search Method. 6.4 Univariate Method. 6.5 Pattern Directions. 6.6 Powell's Method. 6.7 Simplex Method. INDIRECT SEARCH (DESCENT) METHODS. 6.8 Gradient of a Function. 6.9 Steepest Descent (Cauchy) Method. 6.10 Conjugate Gradient (Fletcher-Reeves) Method. 6.11 Newton's Method. 6.12 Marquardt Method. 6.13 Quasi-Newton Methods. 6.14 Davidon-Fletcher-Powell Method. 6.15 Broyden-Fletcher-Goldfarb-Shanno Method. 6.16 Test Functions. 6.17 MATLAB Solution of Unconstrained Optimization Problems. References and Bibliography. Review Questions. Problems. 7 Nonlinear Programming III: Constrained Optimization Techniques. 7.1 Introduction. 7.2 Characteristics of a Constrained Problem. DIRECT METHODS. 7.3 Random Search Methods. 7.4 Complex Method. 7.5 Sequential Linear Programming. 7.6 Basic Approach in the Methods of Feasible Directions. 7.7 Zoutendijk's Method of Feasible Directions. 7.8 Rosen's Gradient Projection Method. 7.9 Generalized Reduced Gradient Method. 7.10 Sequential Quadratic Programming. INDIRECT METHODS. 7.11 Transformation Techniques. 7.12 Basic Approach of the Penalty Function Method. 7.13 Interior Penalty Function Method. 7.14 Convex Programming Problem. 7.15 Exterior Penalty Function Method. 7.16 Extrapolation Techniques in the Interior Penalty Function Method. 7.17 Extended Interior Penalty Function Methods. 7.18 Penalty Function Method for Problems with Mixed Equality and Inequality Constraints. 7.19 Penalty Function Method for Parametric Constraints. 7.20 Augmented Lagrange Multiplier Method. 7.21 Checking the Convergence of Constrained Optimization Problems. 7.22 Test Problems. 7.23 MATLAB Solution of Constrained Optimization Problems. References and Bibliography. Review Questions. Problems. 8 Geometric Programming. 8.1 Introduction. 8.2 Posynomial. 8.3 Unconstrained Minimization Problem. 8.4 Solution of an Unconstrained Geometric Programming Program Using Differential Calculus. 8.5 Solution of an Unconstrained Geometric Programming Problem Using Arithmetic-Geometric Inequality. 8.6 Primal-Dual Relationship and Sufficiency Conditions in the Unconstrained Case. 8.7 Constrained Minimization. 8.8 Solution of a Constrained Geometric Programming Problem. 8.9 Primal and Dual Programs in the Case of Less-Than Inequalities. 8.10 Geometric Programming with Mixed Inequality Constraints. 8.11 Complementary Geometric Programming. 8.12 Applications of Geometric Programming. References and Bibliography. Review Questions. Problems. 9 Dynamic Programming. 9.1 Introduction. 9.2 Multistage Decision Processes. 9.3 Concept of Suboptimization and Principle of Optimality. 9.4 Computational Procedure in Dynamic Programming. 9.5 Example Illustrating the Calculus Method of Solution. 9.6 Example Illustrating the Tabular Method of Solution. 9.7 Conversion of a Final Value Problem into an Initial Value Problem. 9.8 Linear Programming as a Case of Dynamic Programming. 9.9 Continuous Dynamic Programming. 9.10 Additional Applications. References and Bibliography. Review Questions. Problems. 10 Integer Programming. 10.1 Introduction 588. INTEGER LINEAR PROGRAMMING. 10.2 Graphical Representation. 10.3 Gomory's Cutting Plane Method. 10.4 Balas' Algorithm for Zero-One Programming Problems. INTEGER NONLINEAR PROGRAMMING. 10.5 Integer Polynomial Programming. 10.6 Branch-and-Bound Method. 10.7 Sequential Linear Discrete Programming. 10.8 Generalized Penalty Function Method. 10.9 Solution of Binary Programming Problems Using MATLAB. References and Bibliography. Review Questions. Problems. 11 Stochastic Programming. 11.1 Introduction. 11.2 Basic Concepts of Probability Theory. 11.3 Stochastic Linear Programming. 11.4 Stochastic Nonlinear Programming. 11.5 Stochastic Geometric Programming. References and Bibliography. Review Questions. Problems. 12 Optimal Control and Optimality Criteria Methods. 12.1 Introduction. 12.2 Calculus of Variations. 12.3 Optimal Control Theory. 12.4 Optimality Criteria Methods. References and Bibliography. Review Questions. Problems. 13 Modern Methods of Optimization. 13.1 Introduction. 13.2 Genetic Algorithms. 13.3 Simulated Annealing. 13.4 Particle Swarm Optimization. 13.5 Ant Colony Optimization. 13.6 Optimization of Fuzzy Systems. 13.7 Neural-Network-Based Optimization. References and Bibliography. Review Questions. Problems. 14 Practical Aspects of Optimization. 14.1 Introduction. 14.2 Reduction of Size of an Optimization Problem. 14.3 Fast Reanalysis Techniques. 14.4 Derivatives of Static Displacements and Stresses. 14.5 Derivatives of Eigenvalues and Eigenvectors. 14.6 Derivatives of Transient Response. 14.7 Sensitivity of Optimum Solution to Problem Parameters. 14.8 Multilevel Optimization. 14.9 Parallel Processing. 14.10 Multiobjective Optimization. 14.11 Solution of Multiobjective Problems Using MATLAB. References and Bibliography. Review Questions. Problems. A Convex and Concave Functions. B Some Computational Aspects of Optimization. B.1 Choice of Method. B.2 Comparison of Unconstrained Methods. B.3 Comparison of Constrained Methods. B.4 Availability of Computer Programs. B.5 Scaling of Design Variables and Constraints. B.6 Computer Programs for Modern Methods of Optimization. References and Bibliography. C Introduction to MATLAB(R) . C.1 Features and Special Characters. C.2 Defining Matrices in MATLAB. C.3 CREATING m-FILES. C.4 Optimization Toolbox. Answers to Selected Problems. Index .

...read moreread less

3,283 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

A Rapidly Convergent Descent Method for Minimization

[...]

Roger Fletcher, M. J. D. Powell

01 Aug 1963-The Computer Journal

TL;DR: A number of theorems are proved to show that it always converges and that it converges rapidly, and this method has been used to solve a system of one hundred non-linear simultaneous equations.

...read moreread less

Abstract: © The British Computer Society Issue Section: Articles Download all figures A powerful iterative descent method for finding a local minimum of a function of several variables is described. A number of theorems are proved to show that it always converges and that it converges rapidly. Numerical tests on a variety of functions confirm these theorems. The method has been used to solve a system of one hundred non-linear simultaneous equations. Related articles in Web of Science

...read moreread less

4,305 citations

Journal Article•DOI•

A family of variable-metric methods derived by variational means

[...]

Donald Goldfarb

01 Jan 1970-Mathematics of Computation

TL;DR: In this paper, a rank-two variable-metric method was derived using Greenstadt's variational approach, which preserves the positive-definiteness of the approximating matrix.

...read moreread less

Abstract: A new rank-two variable-metric method is derived using Greenstadt's variational approach [Math. Comp., this issue]. Like the Davidon-Fletcher-Powell (DFP) variable-metric method, the new method preserves the positive-definiteness of the approximating matrix. Together with Greenstadt's method, the new method gives rise to a one-parameter family of variable-metric methods that includes the DFP and rank-one methods as special cases. It is equivalent to Broyden's one-parameter family [Math. Comp., v. 21, 1967, pp. 368-381]. Choices for the inverse of the weighting matrix in the variational approach are given that lead to the derivation of the DFP and rank-one methods directly. In the preceding paper [6], Greenstadt derives two variable-metric methods, using a classical variational approach. Specifically, two iterative formulas are developed for updating the matrix Hk, (i.e., the inverse of the variable metric), where Hk is an approximation to the inverse Hessian G-'(Xk) of the function being minimized.* Using the iteration formula Hk+1 = Hk + Ek to provide revised estimates to the inverse Hessian at each step, Greenstadt solves for the correction term Ek that minimizes the norm N(Ek) = Tr (WEkWEkJ) subject to the conditions

...read moreread less

2,788 citations

Journal Article•DOI•

A Class of Methods for Solving Nonlinear Simultaneous Equations

[...]

C. G. Broyden

13 Jan 1965-Mathematics of Computation

TL;DR: In this article, the authors discuss certain modifications to Newton's method designed to reduce the number of function evaluations required during the iterative solution process of an iterative problem solving problem, such that the most efficient process will be that which requires the smallest number of functions evaluations.

...read moreread less

Abstract: solution. The functions that require zeroing are real functions of real variables and it will be assumed that they are continuous and differentiable with respect to these variables. In many practical examples they are extremely complicated anld hence laborious to compute, an-d this fact has two important immediate consequences. The first is that it is impracticable to compute any derivative that may be required by the evaluation of the algebraic expression of this derivative. If derivatives are needed they must be obtained by differencing. The second is that during any iterative solution process the bulk of the computing time will be spent in evaluating the functions. Thus, the most efficient process will tenid to be that which requires the smallest number of function evaluations. This paper discusses certain modificatioins to Newton's method designed to reduce the number of function evaluations required. Results of various numerical experiments are given and conditions under which the modified versions are superior to the original are tentatively suggested.

...read moreread less

2,481 citations

Journal Article•DOI•

Quasi-Newton methods and their application to function minimisation

[...]

C. G. Broyden

01 Jul 1967-Mathematics of Computation

TL;DR: The Newton-Raphson method as mentioned in this paper is one of the most commonly used methods for solving nonlinear problems, where the corrections are computed as linear combinations of the residuals.

...read moreread less

Abstract: can in general only be found by an iterative process in which successively better, in some sense, approximations to the solution are computed. Of the methods available most rely on evaluating at each stage of the calculation a set of residuals and from these obtaining a correction to each element of the approximate solution. The most common way of doing this is to take each correction to be a suitable linear combination of the residuals. There is, of course, no reason in principle why more elaborate schemes should not be used but they are difficult both to analyse theoretically and to implement in practice. The minimisation of a function of n variables, for which it is possible to obtain analytic expressions for the n first partial derivatives, is a particular example of this type of problem. Any technique used to solve nonlinear equations may be applied to the expressions for the partial derivatives but, because it is known in this case that the residuals form the gradient of some function, it is possible to introduce refinements into the method of solution to take account of this extra information. Since, in addition, the value of the function itself is known, further refinements are possible. The best-known method of solving a general set of simultaneous nonlinear equations, in which the corrections are computed as linear combinations of the residuals, is the Newton-Raphson method. The principal disadvantage of this method lies in the necessity of evaluating and inverting the Jacobian matrix at each stage of the iteration and so a number of methods have arisen, e.g. [1], [2], [4] and [8] in which the inverse Jacobian matrix is replaced by an approximation which is modified in some simple manner at each iteration. Although each method has its own peculiarities certain properties are common to a large class of these methods, and several of these are discussed here. In particular, if it is known that the functions to be zeroed are the first partial derivatives of a function F, then it is possible, if F is quadratic, to modify the approximating matrix in such a way that F is minimised in a finite number of steps. This method of modification is not unique and leads to a subclass of methods of which one example is the method of Davidon [3] as amended by Fletcher and Powell [4]. Since in the methods under discussion the corrections are computed as linear combinations of the residuals, it is natural to introduce matrix notation. Thus a function fj of the variables X\\, x2, • ■ ■, x„, may be regarded as a function of the nth order vector x, and each fj in turn may be treated as the jth element of the nth

...read moreread less

598 citations

"Conditioning of Quasi-Newton Method..." refers methods in this paper

...This feature is also true of Broyden's method defined in [10], but not of those devised in [3] (see [6])....
[...]
...C. G. Broyden, "Quasi-Newton methods and their application to function minimisation," Math....
[...]
...Other parametric separations are possible, and have been developed by Broyden [10] and Goldfarb [11]....
[...]
...C. G. Broyden, "A class of methods for solving nonlinear simultaneous equations," Math....
[...]
...Some well-known techniques of this type are the Fletcher-Powell modification of Davidon's method [1], [2], Broyden methods [3], [10], the Barnes-Rosen method [4], [5], and Goldfarb's method [11]....
[...]

Journal Article•DOI•

A Comparison of Several Current Optimization Methods, and the use of Transformations in Constrained Problems

[...]

M. J. Box¹•Institutions (1)

Imperial Chemical Industries¹

01 May 1966-The Computer Journal

TL;DR: Transitions whereby inequality constraints of certain forms can be eliminated from the formulation of an optimization problem are described, and examples of their use compared with other methods for handling such constraints are described.

...read moreread less

Abstract: The performances of eight current methods for unconstrained optimization are evaluated using a set of test problems with up to twenty variables. The use of optimization techniques in the solution of simultaneous non-linear equations is also discussed. Finally transformations whereby inequality constraints of certain forms can be eliminated from the formulation of an optimization problem are described, and examples of their use compared with other methods for handling such constraints.

...read moreread less

377 citations

"Conditioning of Quasi-Newton Method..." refers background in this paper

...They are the sum of two exponentials documented by Box [7], and defined by...
[...]
...They are the sum of two exponentials documented by Box [7], and defined by (40) /(*,, *,) = £ [fr—'« - *—") - (e~li - e-10")]2. where U ranges from .1 to 1 in steps of .1; Rosenbrock's function with the initial estimates suggested by Leon [8], and defined by (41) /(*!, x2) = lOOfc - x2)2 + (1 - Xlf; Wood's function as documented by Pearson [9], and defined by (42) /to, x2,x3, Xi) = lOOfe - ^)2 + (1 - Xl)2 + 9Q(Xi - x\)2 + (1 - x3f + 10.1[(x2 - l)2 + (jr4 - Dl + 19.8(*2 - l)(xt - 1); License or copyright restrictions may apply to redistribution; see https://www.ams.org/journal-terms-of-use 656 D. F. SHANNO and finally the Weibull function, defined by (43) t(xu x2, x3) = ¿ ^exp y-— (t, - xa)"J - y,J , where the y(" and r<0 are perfect data generated for the 99 points corresponding to y = .1 to .99, in steps of .01, for the values Xi = 50, x2 = 1.5, x3 = 25....
[...]
...M. J. Box, "A comparison of several current optimization methods, and the use of transformations in constrained problems," Comput....
[...]
...Box's three-parameter exponential problem was also tried, but nonuniqueness of the optimum caused different methods to converge to different optima, invalidating comparisons....
[...]