Nonsmooth optimization via quasi-Newton methods
Citations
306 citations
Cites background from "Nonsmooth optimization via quasi-Ne..."
...The choice of L-BFGS for non-convex and non-smooth optimization is well supported [25, 26]....
[...]
297 citations
Cites methods from "Nonsmooth optimization via quasi-Ne..."
...In particular, prior work (Yu et al., 2010; Lewis and Overton, 2013) has established theoretically sound modifications to L-BFGS for non-smooth non-convex optimization....
[...]
275 citations
226 citations
Additional excerpts
...Even if convergence is not guaranteed with this implementation, in practice due to the use of inexact line search [Lewis and Overton 2013] and our bound constraints, the parameter updates do not run into non-differentiable values....
[...]
165 citations
References
9,498 citations
"Nonsmooth optimization via quasi-Ne..." refers methods in this paper
...We use ∂ f (x) to denote the Clarke subdifferential [9,45] of f at x , which for locally Lipschitz f is simply the convex hull of the limits of gradients of f evaluated at sequences converging to x [6, Theorem 6....
[...]
1,063 citations
913 citations
503 citations
"Nonsmooth optimization via quasi-Ne..." refers methods in this paper
...Motivated by the low overhead of quasi-Newton methods, Lukšan and Vlček proposed new methods intended to combine the global convergence properties of bundle methods [19,22] with the efficiency of quasi-Newton methods; Haarala [18] gives a good overview....
[...]
...The traditional approach to designing algorithms for nonsmooth optimization is to stabilize steepest descent by exploiting gradient or subgradient information evaluated at multiple points: this is the essential idea of bundle methods [19,22] and also of the gradient sampling algorithm [7,23]....
[...]
477 citations
"Nonsmooth optimization via quasi-Ne..." refers methods in this paper
...This problem was one of the examples in [7]; in the results reported there, the objective function was defined without the logarithm and we enforced the semidefinite constraint by an exact penalty function....
[...]
...The gradient sampling method is far more computationally intensive than BFGS, but it does enjoy convergence guarantees with probability one [7,23]....
[...]
...If the algorithm breaks down (in practice) without satisfying the desired termination condition, the user has the option to continue the optimization using the gradient sampling method of [7]....
[...]
...The traditional approach to designing algorithms for nonsmooth optimization is to stabilize steepest descent by exploiting gradient or subgradient information evaluated at multiple points: this is the essential idea of bundle methods [19,22] and also of the gradient sampling algorithm [7,23]....
[...]