On the steplength selection in gradient methods for unconstrained optimization

Question

Q1. What contributions have the authors mentioned in the paper "On the steplength selection in gradient methods for unconstrained optimization" ?

Q2. What is the key issue to reduce the gradient eigencomponents in a more?

Q3. What is the way to calculate the inverses of the steplengths?

Q4. What is the way to avoid the zigzagging pattern of the gradient?

Q5. How can the gradient methods be extended to the general minimization problem?

Q6. What is the steplength of the Minimal Gradient method?

Q7. What is the way to build steplengths?

Q8. What is the steplength rule for the BB1 and ABBmin methods?

Q9. What is the drawback of the LMSD method?

Q10. Why is the gradient method not proposed as a practical algorithm?

Q11. What is the convergence rate of the BB methods?

Q12. What is the inverse of the Hessian matrices?

Q13. How many iterations of ABBmin is required?

Accepted Answer

Their aim is to investigate the relationship between the steplengths of some gradient methods and the spectrum of the Hessian of the objective function, in order to provide insight into the computational effectiveness of these methods. Their study suggests that, in the quadratic case, the methods that tend to use groups of small steplengths followed by some large steplengths, attempting to approximate the inverses of some eigenvalues of the Hessian matrix, exhibit better numerical behaviour. The methods considered in the general case seem to preserve the behaviour of their quadratic counterparts, in the sense that they appear to follow somehow the spectrum of the Hessian of the objective function during their progress toward a stationary point.

Accepted Answer

a suitable alternation of small and large steplengths appears to be a key issue to reduce the gradient eigencomponents in a more balanced way.

Accepted Answer

The inverses of the steplengths must be chosen as symmetric pairs, in the sense that 1/α2k+1 = λ1 + λn − 1/α2k for sufficiently large k.

Accepted Answer

A possibility for avoiding the zigzagging pattern of the gradient is to foster the sequence {1/αk} to sweep all the spectrum of the Hessian matrix.

Accepted Answer

Among the gradient methods analysed in the previous section, BB1, LMSD and ABBmin can be extended in a natural way to the general minimization problem (1), using line search strategies to ensure convergence to a stationary point [30, 46, 24].

Accepted Answer

Note that αBB1k is equal to the Cauchy steplength at iteration k− 1, i.e., αSDk−1, while αBB2k is equal to the steplength of the Minimal Gradient method at iteration k − 1, i.e.,αMGk−1 = argmin α>0 ‖∇f(xk−1 − αgk−1)‖.

Accepted Answer

Another technique to build steplengths such that the corresponding gradient method approach the optimal complexity is based on the use of the Chebyshev nodes, i.e., the roots of the Chebyshev polynomial of the first kind.

Accepted Answer

regardless of the steplength rule, all the methods keep the sequence of tentative steplengths {αk} bounded below and above by the positive constants αmin and αmax.

Accepted Answer

as shown in Figure 8, when xk is far from x∗, the LMSD method with ms = 5 generates some very small steplengths whose inverses fall out of the spectra of the Hessian matrices; the choice ms = 3 mitigates this drawback, thanks to the smaller number of previous gradients taken into account.

Accepted Answer

It is worth noting that the author of [29] points out that the gradient method described there is not proposed as a practical algorithm, but only to prove that a complexity bound is achievable.

Accepted Answer

The convergence rate of these BB-related methods is generally R-linear, but their practical convergence behaviour is superior than the SD one, like the original BB methods.

Accepted Answer

The values of 1/νk generated by LMSD during a sweep attempt to travel in the spectra of the Hessian matrices corresponding to that sweep; in particular, the extreme Ritz values obtained in a sweep can be considered as an attempt to approximate the extreme eigenvalues of the Hessians in that sweep.

Accepted Answer

the number of iterations of ABBmin ranges between 27% and 69% of the number of iterations of BB1; on NQP1, the latter method is not able to achieve the required accuracy within 5000 iterations.

On the steplength selection in gradient methods for unconstrained optimization

Figures

Citations

A family of spectral gradient methods for optimization

Steplength selection in gradient projection methods for box-constrained quadratic programs

A two-phase gradient method for quadratic programming problems with a single linear constraint and bounds on the variables

Gradient methods exploiting spectral properties

Adaptive $$l_1$$ l 1 -regularization for short-selling control in portfolio selection

References

Matrix computations

A Rapidly Convergent Descent Method for Minimization

Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems

Introductory Lectures on Convex Optimization: A Basic Course

Distribution of eigenvalues for some sets of random matrices

Related Papers (5)

Two-Point Step Size Gradient Methods

The Barzilai and Borwein Gradient Method for the Large Scale Unconstrained Minimization Problem

Benchmarking optimization software with performance profiles

A scaled gradient projection method for constrained image deblurring

Nonmonotone Spectral Projected Gradient Methods on Convex Sets

Frequently Asked Questions (13)