A Subspace Minimization Method for the Trust-Region Step
read more
Citations
Newton-Type Methods for Non-Convex Optimization Under Inexact Hessian Information
Gradient Descent Efficiently Finds the Cubic-Regularized Non-Convex Newton Step
Second-Order Optimization for Non-Convex Machine Learning: An Empirical Study
On solving trust-region and other regularised subproblems in optimization
Solving the Trust-Region Subproblem By a Generalized Eigenvalue Problem
References
Trust Region Methods
Solution of Sparse Indefinite Systems of Linear Equations
Computing a Trust Region Step
The Conjugate Gradient Method and Trust Regions in Large Scale Optimization
Related Papers (5)
Frequently Asked Questions (14)
Q2. What is the main purpose of preconditioning?
As the main purpose of preconditioning is to reduce the number of CG iterations (and hence the number of matrix-vector products) it is useful to consider the number of products before and after preconditioning.
Q3. Why did the authors omit the 5 problems fminsurf, penalty1, penalty2?
The authors also omitted the 5 problems fminsurf, penalty1, penalty2, power and vareigvl because of Matlab memory limitations when extracting the diagonals from the CUTEr Hessian.
Q4. What was the preconditioner for the icfs software?
For IP-SSM, the preconditioner was the incomplete Cholesky factorization of the positive-definite matrix Hj +σaI, where σa is the initial value of the accelerator variable (usually σe, see Algorithm ipAccelerator).
Q5. What is the key to the success of the method?
The authors have considered an interior-point sequential subspace minimization method (IP-SSM) that solves the inequality constrained trust-region subproblem over a sequence of evolving low-dimensional subspaces.
Q6. How many functions are evaluated after preconditioning?
If results from these problems are excluded from the totals, the overall increase in function evaluations for Steihaug-Toint, GLTR and IP-SSM decreases to 82%, 88% and 18% respectively.
Q7. What is the function s for each method?
For each method s the authors define the function πs : [0, rM ] 7→ R+ such thatπs(τ) = 1card(P) card({p ∈ P : log2(rp,s) ≤ τ}),where rp,s denotes the ratio of the number of function evaluations needed to solve problem p with method s and the least number of function evaluations needed to solve problem p.
Q8. What is the way to reduce the number of function evaluations?
a preconditioner should reduce the number of matrix-vector products without increasing the number of function evaluations.
Q9. What is the percentage of problems that were solved using Cholesky preconditioning?
If the statistics for the 9 solved problems from dixmaana–dixmaanl are excluded, thepercentage of cases for which function evaluations increased becomes 68%, 75% and 20% for Steihaug-Toint, GLTR and IP-SSM respectively.
Q10. What is the result of the CG scaling?
This scaling leads to a severely ill-conditioned Hessian for which a matrix-vector product has little or no precision, often causing a compete breakdown of the CG iterations.
Q11. What is the corresponding value for the fe and prds?
A method was considered to have solved a problem successfully when the iterate xj satisfied‖g(xj)‖2 ≤ max{ǫ‖g(x0)‖2, ǫ|f(x0)|, 10−5}, (3.3)with ǫ = 10−6.
Q12. How many function evaluations did GLTR and Steihaug-Toint?
Of the 46 problems included in the summary of Table 4, diagonally preconditioned IP-SSM required more function evaluations than unpreconditioned IP-SSM for9 problems only (20% of the cases).
Q13. What was the result of the tests?
The methods were tested with a diagonal preconditioner based on the matrix D = diag(d1, d2, . . . , dn) of diagonals of the Hessian evaluated at xj .
Q14. How many problems were solved using a Cholesky preconditioning?
Of the 49 problems included in the summary of Table 9, incomplete Cholesky preconditioned IP-SSM required more function evaluations than unpreconditioned IPSSM for 12 problems (25% of the cases).