Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods
Summary (3 min read)
1 Introduction
- In Section 2, the authors consider functions satisfying the Kurdyka- Lojasiewicz inequality.
- The authors recover and improve previous works on the question of gradient methods (Section 3) and proximal algorithms (Section 4).
- The convergence results the authors obtained involve different assumptions on the linear operator A: they either assume that ‖A‖ < 1 [11, Theorem 3] or that A satisfies the restricted isometry property [12, Theorem 4].
2.1 Some definitions from variational analysis
- The notion of subdifferential plays a central role in the following theoretical and algorithm developments.
- The limiting processes used in an algorithmic context necessitate the introduction of the more stable notion of limiting-subdifferential ([47]) (or simply subdifferential) of f .
- These generalized notions of differentiation give birth to generalized notions of critical point.
- The authors end this section by some words on an important class of functions which are intimately linked to projection mappings: the indicator functions.
2.2 Kurdyka- Lojasiewicz inequality: the nonsmooth case
- The authors begin this section by a brief discussion on real semi-algebraic sets and functions which will provide a very rich class of functions satisfying the Kurdyka- Lojasiewicz.
- One easily sees that the class of semi-algebraic sets is stable under the operation of finite union, finite intersection, Cartesian product or complementation and that polynomial functions are, of course, semi-algebraic functions.
- Of course, this result also holds when replacing sup by inf.
- (2) (b) Proper lower semicontinuous functions which satisfy the Kurdyka- Lojasiewicz inequality at each point of dom ∂f are called KL functions.
- Such examples are abundantly commented in [5], and they strongly motivate the present study.
2.3 An inexact descent convergence result for KL functions
- In the sequel, the authors consider sequences (xk)k∈N which satisfy the following conditions, which they will subsequently refer to as H1, H2, H3: H1.
- Consider a sequence (xk)k∈N which satisfies conditions H1, H2.
- Simply reproduce the beginning of the proof of the previous lemma.
- Theorem 2.12 (Local convergence to global minima).
3 Inexact gradient methods
- The first natural domain of application of their previous results concerns the simplest firstorder methods, namely the gradient methods.
- As the authors shall see, their abstract framework (Theorem 2.9) allows to recover some of the results of [1].
- In order to illustrate the versatility of their algorithmic framework, the authors also consider a fairly general semi-algebraic feasibility problem, and they provide, in the line of [42], a local convergence proof for an inexact averaged projection method.
3.1 General convergence result
- To illustrate the variety of dynamics covered by Algorithm 1, let us show how variable metric gradient algorithms can be cast in this framework.
- This type of quadratic models arises, for instance, in trust-region methods (see [1] which is also connected to Lojasiewicz inequality).
- For the convergence analysis of Algorithm 1, the authors shall of course use the elementary but important descent lemma (see for example [50] 3.2.12).
- The authors then have the following result: Theorem 3.2.
- The sequence (xk)k∈N has been assumed to be bounded.
3.2 Prox-regularity
- When considering nonconvex feasibility problems, the authors are led to consider squared distance functions to nonconvex sets.
- Contrary to what happens in the standard convex setting, such functions may fail to be differentiable.
- The key concept of prox-regularity provides a characterization of the local differentiability of these functions and, as the authors will see in the next section, it allows in turn to design averaged projection methods with interesting converging properties.
- Let us gather the following definition/properties concerning F that are fundamental for their purpose.
3.3 Averaged projections for feasibility problems
- Moreover, this sequence has a finite length and converges to a feasible point x̄, i.e. such that x̄ ∈ p ⋂ i=1 Fi. Proof.
- Let us first observe that the function f (given by (23)) is semi-algebraic, because the distance function to any nonempty semi-algebraic set is semi-algebraic (see Lemma 2.3 or [30, 15]).
- Applying now Corollary 2.7, the authors get xk+1 ∈ B(x∗, ρ) and their induction proof is complete.
- Fi having a linearly regular intersection at some point x̄, an important concept that originates from [47, Theorem 2.8].
4 Inexact proximal algorithm
- Let us first recall the exact version of the proximal algorithm for nonconvex functions [36, 3].
- In view of the assumption inf f > −∞, the lower semicontinuity of f and the coercivity of the squared norm imply that proxλf has nonempty values.
4.1 Convergence of an inexact proximal algorithm for KL functions
- Let us introduce an inexact version of the proximal point method.
- The following elementary Lemma is useful for the convergence analysis of the algorithm.
- Direct algebraic manipulation of the above inequality yields the first inequality.
4.2 A variant for convex functions
- When the function under consideration is convex and satisfies the Kurdyka- Lojasiewicz property, Algorithm 2 can be simplified while its convergence properties are maintained.
- Consider the sequence (xk)k∈N generated by the following algorithm.
- So are in particular many convex functions: this fact was a strong motivation for the above result.
5 Inexact forward-backward algorithm
- This kind of structured problem occurs frequently, see for instance [25, 6] and Example 5.4.
- In a first part, the authors recall what is the classical forward-backward algorithm and explain how Algorithm 3 provides an inexact version of the latter; the special case of projection methods is also discussed.
- The authors end this section by providing illustrations of their results through problems coming from compressive sensing, and hardconstrained feasibility problems.
5.1 The forward-backward splitting algorithm for nonconvex functions
- L where γ and λ are given thresholds, the forward-backward splitting algorithm reads xk+1 ∈ proxγk g(x k − γk∇h(xk)).
- (50) An important observation here is that the sequence is not uniquely defined since proxγk g may be multivalued; a surprising fact is that this freedom in the choice of the sequence does not impact the convergence properties of the algorithm (see Theorem 5.1).
- Let us show how this algorithm fits into the general framework of Algorithm 3.
- As for the proximal algorithm, the inexact version offers some flexibility in the choice of xk+1 by relaxing both the descent condition and the optimality conditions.
- The authors thus find the nonconvex nonsmooth gradient-projection method xk+1 ∈ PC(xk − γk∇h(xk)). (51).
5.2 Convergence of an inexact forward-backward splitting algorithm
- Let us now return to the general inexact forward-backward splitting Algorithm 3, and show the following convergence result.
- The authors are precisely in the case which has been examined in Theorem 4.2 (continuous functions on their domain).
- Remark 5.2. (a) For the exact forward-backward splitting algorithm the continuity assumption concerning g is useless.
5.3 Examples
- Example 5.4 (Forward-backward splitting for compressive sensing).
- They also provide at the same time a very general convergence result which can be immediately generalized to compressive sensing problems involving semialgebraic or real-analytic nonlinear measurements.
- By applying the forward-backward splitting algorithm to this problem, the authors aim at finding a point which satisfies the hard constraints modelled by F , while the other constraints are satisfied in a possibly weaker sense (see [25] and references therein).
- Let us now consider the KL analysis in the regular intersection case (see definition in Remark 3.6).
- To this end, the authors will use the following result [42, Proposition 8.5] (based itself on a characterization given in [37]).
7 Conclusion
- Very often, iterative minimization algorithms rely on inexact solution of minimization subproblems, whose exact solution may be almost as difficult to obtain as the solution of the original minimization problem.
- Even when the minimization subproblem can be solved with high accuracy, its solutions are mere approximations of the solution of the original problems.
- In these cases, over-solving the minimization subproblems would increase the computational burden of the method, and may slow down the final computation of a good approximation of the solution.
- In particular their abstract scheme was designed to handle relative errors because practical methods always involve numerical approximation, e.g., the representation of a real number in floating points numbers with a fixed byte-length.
- Moreover, the authors also supplied stopping criteria for the solution of the minimization subproblems in general.
Did you find this useful? Give us your feedback
Citations
1,563 citations
1,032 citations
Additional excerpts
...It is also worth noting that the convergence of the alternating proximal minimization algorithm is also studied in [1] for Kurdyka– Lojasiewicz functions....
[...]
867 citations
684 citations
477 citations
Cites background from "Convergence of descent methods for ..."
...In general, block coordinate (or Gauss–Seidel) descent schemes can be implemented in many ways, and many generalizations involving non-smooth terms are proved to converge in the literature (Grippo and Sciandrone 2000, Auslender 1976, Attouch et al. 2013)....
[...]
...…for The convergence of alternating minimizations or proximal (implicit) descent steps in this setting (which is not necessarily covered by the general approach of Tseng 2001) has been studied by Attouch et al. (2013), Attouch, Bolte, Redont and Soubeyran (2010) and Beck and Tetruashvili (2013)....
[...]
References
[...]
18,609 citations
7,669 citations
3,690 citations
"Convergence of descent methods for ..." refers background in this paper
...metric regularity [2, 40, 41], cohypomonotonicity [48, 35], selfconcordance [47], partial smoothness [39, 56]....
[...]
2,686 citations
"Convergence of descent methods for ..." refers methods in this paper
...Numerical analysis [50]: cone of positive semidefinite matrices, Stiefel manifold (spheres, orthogonal group [38]), matrices with fixed rank....
[...]
2,645 citations
"Convergence of descent methods for ..." refers background or methods in this paper
...This kind of structured problem occurs frequently, see for instance [24, 6] and Example 5....
[...]
...By applying the forward-backward splitting algorithm to this problem, we aim at finding a point which satisfies the hard constraints modelled by F , while the other constraints are satisfied in a possibly weaker sense (see [24] and references therein)....
[...]
Related Papers (5)
Frequently Asked Questions (14)
Q2. What have the authors stated for future works in "Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized gauss-seidel methods" ?
The computational implementation of the methods analyzed in this paper, as well as these stopping rules are topics for future research.
Q3. What is the result of the inexact averaged projection algorithm?
If x0 is sufficiently close to ⋂pi=1 Fi, then the inexact averaged projection algorithm reduces to the gradient methodxk+1 = xk − θ p ∇f(xk) + ǫk,with f being given by (23), which therefore defines a unique sequence.
Q4. Why does the algorithm converge to a global minimizer?
Due to the fact that a convex function has at most one critical value, the bounded sequences generated by the above algorithms converge to a global minimizer.
Q5. What is the first condition intended to model?
The first condition is intended to model a descent property: since it involves a measure of the quality of the descent, the authors call it a sufficientdecrease condition (see [20] for an early paper on this subject, [7] for an interpretation of this condition in decision sciences, and [43] for a discussion on this type of condition in a particular nonconvex nonsmooth optimization setting).
Q6. What is the main assumption for the study of such algorithms?
In this paper, their central assumption for the study of such algorithms is that the function f satisfies the (nonsmooth) Kurdyka- Lojasiewicz inequality, which means, roughly speaking, that the functions under consideration are sharp up to a reparametrization (see Section 2.2).
Q7. What is the sequence of values f(xk)kN?
If (xk)k∈N is bounded, then it converges to a minimizer of f and the sequence of values f(xk) converges to the program value min f .
Q8. What is the importance of Kurdyka- Lojasiewicz inequality?
In the context of optimization, the importance of Kurdyka- Lojasiewicz inequality is due to the fact that many problems involve functions satisfying such inequalities, and it is often elementary to check that such an inequality is satisfied; real semi-algebraic functions provide a very rich class of functions satisfying the Kurdyka- Lojasiewicz, see [5] for a thorough discussion on these aspects, and also Section 2.2 for a simple illustration.
Q9. What is the simplest way to find the eigenvalues of each k ?
For each i in {1, . . . , p}, take a sequence of symmetric positive definite matrices (Aki )k∈N of size ni such that the eigenvalues of each A k i (k ∈ N, i ∈ {1, . . . , p}) lie in [λ, λ].
Q10. How can the authors solve the minimization subproblems?
In these cases, over-solving the minimization subproblems would increase the computational burden of the method, and may slow down the final computation of a good approximation of the solution.
Q11. What is the inverse of the g + h?
To see that g + h is a KL function, the authors simply note that h is a polynomial function and that ‖ · ‖0 has a piecewise linear graph, hence the sum g + h is semi-algebraic.
Q12. What is the effect of under-solving the minimization subproblems?
On the other hand, under-solving the minimization subproblems may result in a breakdown of the algorithm, and convergence to a solution may be lost.
Q13. What is the subdifferential of f at x dom?
The subdifferential of f at x ∈ dom f , written ∂f(x), is defined as follows∂f(x) := {v ∈ Rn : ∃xk → x, f(xk) → f(x), vk ∈ ∂̂f(xk) → v}.
Q14. What is the counting norm for n?
When n = 1, the counting norm is denoted by | · |0; in that case one easily establishes thatproxγλ|·|0u = u if |u| > √2γλ {0, u} if |u| = √2γλ 0 otherwise.