scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods

TL;DR: This work proves an abstract convergence result for descent methods satisfying a sufficient-decrease assumption, and allowing a relative error tolerance, that guarantees the convergence of bounded sequences under the assumption that the function f satisfies the Kurdyka–Łojasiewicz inequality.
Abstract: In view of the minimization of a nonsmooth nonconvex function f, we prove an abstract convergence result for descent methods satisfying a sufficient-decrease assumption, and allowing a relative error tolerance. Our result guarantees the convergence of bounded sequences, under the assumption that the function f satisfies the Kurdyka–Łojasiewicz inequality. This assumption allows to cover a wide range of problems, including nonsmooth semi-algebraic (or more generally tame) minimization. The specialization of our result to different kinds of structured problems provides several new convergence results for inexact versions of the gradient method, the proximal method, the forward–backward splitting algorithm, the gradient projection and some proximal regularization of the Gauss–Seidel method in a nonconvex setting. Our results are illustrated through feasibility problems, or iterative thresholding procedures for compressive sensing.

Summary (3 min read)

1 Introduction

  • In Section 2, the authors consider functions satisfying the Kurdyka- Lojasiewicz inequality.
  • The authors recover and improve previous works on the question of gradient methods (Section 3) and proximal algorithms (Section 4).
  • The convergence results the authors obtained involve different assumptions on the linear operator A: they either assume that ‖A‖ < 1 [11, Theorem 3] or that A satisfies the restricted isometry property [12, Theorem 4].

2.1 Some definitions from variational analysis

  • The notion of subdifferential plays a central role in the following theoretical and algorithm developments.
  • The limiting processes used in an algorithmic context necessitate the introduction of the more stable notion of limiting-subdifferential ([47]) (or simply subdifferential) of f .
  • These generalized notions of differentiation give birth to generalized notions of critical point.
  • The authors end this section by some words on an important class of functions which are intimately linked to projection mappings: the indicator functions.

2.2 Kurdyka- Lojasiewicz inequality: the nonsmooth case

  • The authors begin this section by a brief discussion on real semi-algebraic sets and functions which will provide a very rich class of functions satisfying the Kurdyka- Lojasiewicz.
  • One easily sees that the class of semi-algebraic sets is stable under the operation of finite union, finite intersection, Cartesian product or complementation and that polynomial functions are, of course, semi-algebraic functions.
  • Of course, this result also holds when replacing sup by inf.
  • (2) (b) Proper lower semicontinuous functions which satisfy the Kurdyka- Lojasiewicz inequality at each point of dom ∂f are called KL functions.
  • Such examples are abundantly commented in [5], and they strongly motivate the present study.

2.3 An inexact descent convergence result for KL functions

  • In the sequel, the authors consider sequences (xk)k∈N which satisfy the following conditions, which they will subsequently refer to as H1, H2, H3: H1.
  • Consider a sequence (xk)k∈N which satisfies conditions H1, H2.
  • Simply reproduce the beginning of the proof of the previous lemma.
  • Theorem 2.12 (Local convergence to global minima).

3 Inexact gradient methods

  • The first natural domain of application of their previous results concerns the simplest firstorder methods, namely the gradient methods.
  • As the authors shall see, their abstract framework (Theorem 2.9) allows to recover some of the results of [1].
  • In order to illustrate the versatility of their algorithmic framework, the authors also consider a fairly general semi-algebraic feasibility problem, and they provide, in the line of [42], a local convergence proof for an inexact averaged projection method.

3.1 General convergence result

  • To illustrate the variety of dynamics covered by Algorithm 1, let us show how variable metric gradient algorithms can be cast in this framework.
  • This type of quadratic models arises, for instance, in trust-region methods (see [1] which is also connected to Lojasiewicz inequality).
  • For the convergence analysis of Algorithm 1, the authors shall of course use the elementary but important descent lemma (see for example [50] 3.2.12).
  • The authors then have the following result: Theorem 3.2.
  • The sequence (xk)k∈N has been assumed to be bounded.

3.2 Prox-regularity

  • When considering nonconvex feasibility problems, the authors are led to consider squared distance functions to nonconvex sets.
  • Contrary to what happens in the standard convex setting, such functions may fail to be differentiable.
  • The key concept of prox-regularity provides a characterization of the local differentiability of these functions and, as the authors will see in the next section, it allows in turn to design averaged projection methods with interesting converging properties.
  • Let us gather the following definition/properties concerning F that are fundamental for their purpose.

3.3 Averaged projections for feasibility problems

  • Moreover, this sequence has a finite length and converges to a feasible point x̄, i.e. such that x̄ ∈ p ⋂ i=1 Fi. Proof.
  • Let us first observe that the function f (given by (23)) is semi-algebraic, because the distance function to any nonempty semi-algebraic set is semi-algebraic (see Lemma 2.3 or [30, 15]).
  • Applying now Corollary 2.7, the authors get xk+1 ∈ B(x∗, ρ) and their induction proof is complete.
  • Fi having a linearly regular intersection at some point x̄, an important concept that originates from [47, Theorem 2.8].

4 Inexact proximal algorithm

  • Let us first recall the exact version of the proximal algorithm for nonconvex functions [36, 3].
  • In view of the assumption inf f > −∞, the lower semicontinuity of f and the coercivity of the squared norm imply that proxλf has nonempty values.

4.1 Convergence of an inexact proximal algorithm for KL functions

  • Let us introduce an inexact version of the proximal point method.
  • The following elementary Lemma is useful for the convergence analysis of the algorithm.
  • Direct algebraic manipulation of the above inequality yields the first inequality.

4.2 A variant for convex functions

  • When the function under consideration is convex and satisfies the Kurdyka- Lojasiewicz property, Algorithm 2 can be simplified while its convergence properties are maintained.
  • Consider the sequence (xk)k∈N generated by the following algorithm.
  • So are in particular many convex functions: this fact was a strong motivation for the above result.

5 Inexact forward-backward algorithm

  • This kind of structured problem occurs frequently, see for instance [25, 6] and Example 5.4.
  • In a first part, the authors recall what is the classical forward-backward algorithm and explain how Algorithm 3 provides an inexact version of the latter; the special case of projection methods is also discussed.
  • The authors end this section by providing illustrations of their results through problems coming from compressive sensing, and hardconstrained feasibility problems.

5.1 The forward-backward splitting algorithm for nonconvex functions

  • L where γ and λ are given thresholds, the forward-backward splitting algorithm reads xk+1 ∈ proxγk g(x k − γk∇h(xk)).
  • (50) An important observation here is that the sequence is not uniquely defined since proxγk g may be multivalued; a surprising fact is that this freedom in the choice of the sequence does not impact the convergence properties of the algorithm (see Theorem 5.1).
  • Let us show how this algorithm fits into the general framework of Algorithm 3.
  • As for the proximal algorithm, the inexact version offers some flexibility in the choice of xk+1 by relaxing both the descent condition and the optimality conditions.
  • The authors thus find the nonconvex nonsmooth gradient-projection method xk+1 ∈ PC(xk − γk∇h(xk)). (51).

5.2 Convergence of an inexact forward-backward splitting algorithm

  • Let us now return to the general inexact forward-backward splitting Algorithm 3, and show the following convergence result.
  • The authors are precisely in the case which has been examined in Theorem 4.2 (continuous functions on their domain).
  • Remark 5.2. (a) For the exact forward-backward splitting algorithm the continuity assumption concerning g is useless.

5.3 Examples

  • Example 5.4 (Forward-backward splitting for compressive sensing).
  • They also provide at the same time a very general convergence result which can be immediately generalized to compressive sensing problems involving semialgebraic or real-analytic nonlinear measurements.
  • By applying the forward-backward splitting algorithm to this problem, the authors aim at finding a point which satisfies the hard constraints modelled by F , while the other constraints are satisfied in a possibly weaker sense (see [25] and references therein).
  • Let us now consider the KL analysis in the regular intersection case (see definition in Remark 3.6).
  • To this end, the authors will use the following result [42, Proposition 8.5] (based itself on a characterization given in [37]).

7 Conclusion

  • Very often, iterative minimization algorithms rely on inexact solution of minimization subproblems, whose exact solution may be almost as difficult to obtain as the solution of the original minimization problem.
  • Even when the minimization subproblem can be solved with high accuracy, its solutions are mere approximations of the solution of the original problems.
  • In these cases, over-solving the minimization subproblems would increase the computational burden of the method, and may slow down the final computation of a good approximation of the solution.
  • In particular their abstract scheme was designed to handle relative errors because practical methods always involve numerical approximation, e.g., the representation of a real number in floating points numbers with a fixed byte-length.
  • Moreover, the authors also supplied stopping criteria for the solution of the minimization subproblems in general.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

HAL Id: hal-00790042
https://hal.archives-ouvertes.fr/hal-00790042
Submitted on 19 Feb 2013
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entic research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diusion de documents
scientiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Convergence of descent methods for semi-algebraic and
tame problems: proximal algorithms, forward-backward
splitting, and regularized Gauss-Seidel methods
Hedy Attouch, Jérôme Bolte, Benar Fux Svaiter
To cite this version:
Hedy Attouch, Jérôme Bolte, Benar Fux Svaiter. Convergence of descent methods for semi-algebraic
and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel
methods. Mathematical Programming, Series A, Springer, 2011, 137 (1), pp.91-124. �10.1007/s10107-
011-0484-9�. �hal-00790042�

Convergence of descent methods for semi-algebraic
and tame problems: proximal algorithms,
forward-backward splitting, and regularized
Gauss-Seidel methods
Hedy ATTOUCH
J´erˆome BOLTE
Benar Fux SVAITER
December, 15, 2010; revised: July, 21, 2011
Abstract In view of the minimization of a nonsmooth nonconvex function f, we prove an
abstract convergence result for descent methods satisfying a sufficient-decrease assumption,
and allowing a relative error tolerance. Our result guarantees the convergence of bounded
sequences, under the assumption that the function f satisfies the Kurdyka- Lojasiewicz in-
equality. This assumption allows to cover a wide range of problems, including nonsmooth
semi-algebraic (or more generally tame) minimization. The specialization of our result to
different kinds of structured problems provides several new convergence results for inexact
versions of the gradient method, the proximal method, the forward-backward splitting algo-
rithm, the gradient projection and some proximal regularization of the Gauss-Seidel method
in a nonconvex setting. Our results are illustrated through feasibility problems, or iterative
thresholding procedures for compressive sensing.
2010 Mathematics Subject Classification: 34G25, 47J25, 47J30, 47J35, 49M15, 49M37,
65K15, 90C25, 90C53.
Keywords: Nonconvex nonsmooth optimization, semi-algebraic optimization, tame opti-
mization, Kurdyka- Lojasiewicz inequality, descent methods, relative error, sufficient decrease,
forward-backward splitting, alternating minimization, proximal algorithms, iterative thresh-
olding, block-coordinate methods, o-minimal structures.
1 Introduction
Being given a proper lower semicontinuous function f : R
n
R {+∞}, we consider descent
methods that generate sequences (x
k
)
kN
complying with the following conditions:
I3M UMR CNRS 5149, Universit´e Montpellier II, Place Eug`ene Bataillon, 34095 Montpellier, France
(attouch@math.univ-montp2.fr) Partially supported by ANR-08-BLAN-0294-03.
TSE (GREMAQ, Universit´e Toulouse I), Manufacture des Tabacs, 21 all´ee de Brienne, Toulouse, France
(jerome.bolte@tse-eu.fr) Partially supported by ANR-08-BLAN-0294-03.
IMPA, Estrada Dona Castorina 110, 22460 - 320 Rio de Janeiro, Brazil (benar@impa.br) Partially
supported by CNPq grants 474944/2010-7, 303583/2008-8, FAPERJ grant E-26/102.821/2008 and PRONEX-
Optimization.
1

for each k N, f(x
k+1
) + akx
k+1
x
k
k
2
f(x
k
);
for each k N, there exists w
k+1
f(x
k+1
) such that
kw
k+1
k bkx
k+1
x
k
k;
where a, b are positive constants and f (x
k+1
) denotes the set of limiting subgradients of f
at x
k+1
(see Section 2.1 for a definition). The first condition is intended to model a descent
property: since it involves a measure of the quality of the descent, we call it a sufficient-
decrease condition (see [20] for an early paper on this subject, [7] for an interpretation of
this condition in decision sciences, and [43] for a discussion on this type of condition in
a particular nonconvex nonsmooth optimization setting). The second condition originates
from the well-known fact that most algorithms in optimization are generated by an infinite
sequence of subproblems which involve exact or inexact minimization processes. This is
the case of gradient methods, Newton’s method, forward-backward algorithm, Gauss-Seidel
method, proximal methods... The second set of conditions precisely reflects relative inexact
optimality conditions for such minimization subproblems.
When dealing with descent methods for convex functions, it became natural to expect that
the algorithm will provide globally convergent sequences (i.e., for arbitrary starting point,
the algorithm generates a sequence that converges to a solution). The standard recipe to
obtain the convergence is to prove that the sequence is (quasi-)Fej´er monotone relative to the
set of minimizers of f. This fact has also been used intensively in the study of algorithms
for nonexpansive mappings (see e.g. [24]). When the functions under consideration are not
convex (or quasiconvex), the monotonicity properties are in general “broken”, and descent
methods may provide sequences that exhibit highly oscillatory behaviors. Apparently this
phenomenon was first observed by Curry (see [27]); in the framework of differential equations
similar behaviors occur, in [28] a nonconverging bounded curve of a 2-dimensional gradient
system of a C
function is provided, this example was adapted in [1] to gradient methods.
In order to circumvent such behaviors, it seems necessary to work with functions that
present a certain structure. This structure can be of an algebraic nature, e.g. quadratic func-
tions, polynomial functions, real analytic functions, but it can also be captured by adequate
analytic assumptions, e.g. metric regularity [2, 41, 42], cohypomonotonicity [51, 36], self-
concordance [49], partial smoothness [40, 59]. In this paper, our central assumption for the
study of such algorithms is that the function f satisfies the (nonsmooth) Kurdyka- Lojasiewicz
inequality, which means, roughly speaking, that the functions under consideration are sharp
up to a reparametrization (see Section 2.2). The reader is referred to [44, 45, 38] for the
smooth cases, and to [15, 17] for nonsmooth inequalities. Kurdyka- Lojasiewicz inequalities
have been successfully used to analyze various types of asymptotic behavior: gradient-like sys-
tems [15, 34, 35, 39], PDE [55, 22], gradient methods [1, 48], proximal methods [3], projection
methods or alternating methods [5, 14].
In the context of optimization, the importance of Kurdyka- Lojasiewicz inequality is due
to the fact that many problems involve functions satisfying such inequalities, and it is often
elementary to check that such an inequality is satisfied; real semi-algebraic functions pro-
vide a very rich class of functions satisfying the Kurdyka- Lojasiewicz, see [5] for a thorough
discussion on these aspects, and also Section 2.2 for a simple illustration.
Many other functions, that are met in real world problems, and which are not semi-
algebraic, satisfy very often the Kurdyka- Lojasiewicz inequality. An important class is given
2

by functions definable in an o-minimal structure. The monographs [26, 30] are good refer-
ences on o-minimal structures; concerning Kurdyka- Lojasiewicz inequalities in this context
the reader is referred to [38, 17]. Functions definable in o-minimal structures or functions
whose graphs are locally definable are often called tame functions. We do not give a precise
definition of definability in this work, but the flexibility of this concept is briefly illustrated
in Example 5.4(b). Functions that are not necessarily tame but that satisfy Lojasiewicz in-
equality are given in [5], basic assumptions involve metric-regularity and transversality (see
also [41, 42] and Example 5.5).
From a technical viewpoint, our work blends the approach to nonconvex problems pro-
vided in [1, 15, 3, 5] with the relative error philosophy developed in [56, 57, 58, 36]. A
valuable guideline for the error aspects is the development of an inexact proximal algorithm
for equations governed by a monotone operator, and which is based on an estimation of the
relative error, see [56, 57, 58]. Related results without monotonicity (with a control on the
lack of monotonicity) have been obtained in [36].
Thus, in summary, this article aims at:
providing a unified framework for the analysis of classical descent methods,
relaxing exact descent conditions,
extending convergence results obtained in [1, 3, 5, 56, 57, 58, 36] to richer and more
flexible algorithms,
providing theorems which cover general nonsmooth problems under easily verifiable
assumptions (e.g. semi-algebraicity).
Let us proceed with a more precise description of the contents of this article.
In Section 2, we consider functions satisfying the Kurdyka- Lojasiewicz inequality. We
first give the definition and a brief analysis of this basic property. Then in subsection 2.3,
we provide an abstract convergence result for sequences satisfying the sufficient-decrease
condition and the relative inexact optimality condition mentioned above.
This result is then applied to the analysis of several descent methods with relative error
tolerance.
We recover and improve previous works on the question of gradient methods (Section 3)
and proximal algorithms (Section 4). Our results are illustrated through semi-algebraic fea-
sibility problems by means of an inexact version of the averaged projection method.
We also provide, in Section 5, an in-depth analysis of forward-backward splitting algo-
rithms in a nonsmooth nonconvex setting. Setting aside the convex case, we did not find
any general convergence results for this kind of algorithm, also, the results we present here
seem to be new. These results can be applied to general semi-algebraic problems (or tame
problems) and to nonconvex problems presenting a well-conditioned structure. An important
and enlightening consequence of our study is that the bounded sequences (x
k
)
kN
generated
by the nonconvex gradient projection algorithm
x
k+1
P
C
x
k
1
2L
h(x
k
)
are convergent sequences so long as C is a closed semi-algebraic subset of R
n
and h : R
n
R is C
1
semi-algebraic with L-Lipschitz gradient (see [9] for some applications in signal
3

processing). As an application of our general results on forward-backward splitting, we
consider the following type of problem
(P ) min
λkxk
0
+
1
2
kAx bk
2
: x R
n
where λ > 0 and k· k
0
is the counting norm (or the
0
norm), A is an m ×n real matrix and
b R
m
. We recall that for x in R
n
, kxk
0
is the number of nonzero components of x. This
kind of problem is central in compressive sensing [29]. In [11, 12] this problem is tackled by
using a “hard iterative thresholding” algorithm
x
k+1
prox
γ
k
λk·k
0
x
k
γ
k
(A
T
Ax
k
A
T
b)
,
where (γ
k
)
kN
is a sequence of stepsizes evolving in a convenient interval (the definition of the
proximal mapping prox
λf
is given in Section 4). The convergence results the authors obtained
involve different assumptions on the linear operator A: they either assume that kAk < 1 [11,
Theorem 3] or that A satisfies the restricted isometry property [12, Theorem 4]. Our results
show that convergence actually occurs for any linear map so long as the sequence (x
k
)
kN
is
bounded. We also consider iterative thresholding with
p
“norms” for sparse approximation
(in the spirit of [21]) and hard-constrained feasibility problems; in both cases convergence of
the bounded sequences is established.
In a last section, we study the proximal regularization of a p blocks alternating method
(with p 2). This method has been introduced by Auslender [8] for convex minimization;
see also [32] in a nonconvex setting. Convergence results for such methods are usually stated
in terms of cluster points. To our knowledge, the first convergence result in a nonconvex
setting, under fairly general assumptions, was obtained in [5] for a two-blocks exact version.
Our generalization is twofolds: we consider methods involving an arbitrary numbers of blocks,
and we provide a proper convergence result.
2 An abstract convergence result for inexact descent methods
The Euclidean scalar product of R
n
and its corresponding norm are respectively denoted by
, ·i and k· k.
2.1 Some definitions from variational analysis
Standard references are [23, 54, 47].
If F : R
n
R
m
is a point-to-set mapping its graph is defined by
Graph F := {(x, y) R
n
× R
m
: y F (x)},
while its domain is given by dom F := {x R
n
: F (x) 6= ∅}. Similarly, the graph of a
real-extended-valued function f : R
n
R {+∞} is defined by
Graph f := {(x, s) R
n
× R : s = f(x)},
and its domain by dom f := {x R
n
: f(x) < +∞}. The epigraph of f is defined as usual as
epi f := {(x, λ) R
n
× R : f(x) λ}.
4

Citations
More filters
Journal ArticleDOI
TL;DR: A self-contained convergence analysis framework is derived and it is established that each bounded sequence generated by PALM globally converges to a critical point.
Abstract: We introduce a proximal alternating linearized minimization (PALM) algorithm for solving a broad class of nonconvex and nonsmooth minimization problems. Building on the powerful Kurdyka---?ojasiewicz property, we derive a self-contained convergence analysis framework and establish that each bounded sequence generated by PALM globally converges to a critical point. Our approach allows to analyze various classes of nonconvex-nonsmooth problems and related nonconvex proximal forward---backward algorithms with semi-algebraic problem's data, the later property being shared by many functions arising in a wide variety of fundamental applications. A by-product of our framework also shows that our results are new even in the convex setting. As an illustration of the results, we derive a new and simple globally convergent algorithm for solving the sparse nonnegative matrix factorization problem.

1,563 citations

Journal ArticleDOI
TL;DR: This paper studies an alternative inexact BCD approach which updates the variable blocks by successively minimizing a sequence of approximations of f which are either locally tight upper bounds of $f$ or strictly convex local approximation of f.
Abstract: The block coordinate descent (BCD) method is widely used for minimizing a continuous function $f$ of several block variables. At each iteration of this method, a single block of variables is optimized, while the remaining variables are held fixed. To ensure the convergence of the BCD method, the subproblem of each block variable needs to be solved to its unique global optimal. Unfortunately, this requirement is often too restrictive for many practical scenarios. In this paper, we study an alternative inexact BCD approach which updates the variable blocks by successively minimizing a sequence of approximations of $f$ which are either locally tight upper bounds of $f$ or strictly convex local approximations of $f$. The main contributions of this work include the characterizations of the convergence conditions for a fairly wide class of such methods, especially for the cases where the objective functions are either nondifferentiable or nonconvex. Our results unify and extend the existing convergence results ...

1,032 citations


Additional excerpts

  • ...It is also worth noting that the convergence of the alternating proximal minimization algorithm is also studied in [1] for Kurdyka– Lojasiewicz functions....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the convergence of the alternating direction method of multipliers (ADMM) for minimizing a nonconvex and possibly nonsmooth objective function is analyzed, subject to coupled linear equality constraints.
Abstract: In this paper, we analyze the convergence of the alternating direction method of multipliers (ADMM) for minimizing a nonconvex and possibly nonsmooth objective function, $$\phi (x_0,\ldots ,x_p,y)$$ , subject to coupled linear equality constraints. Our ADMM updates each of the primal variables $$x_0,\ldots ,x_p,y$$ , followed by updating the dual variable. We separate the variable y from $$x_i$$ ’s as it has a special role in our analysis. The developed convergence guarantee covers a variety of nonconvex functions such as piecewise linear functions, $$\ell _q$$ quasi-norm, Schatten-q quasi-norm ( $$0

867 citations

Posted Content
TL;DR: In this article, an alternative inexact block coordinate descent (BCD) approach is proposed, which updates the variable blocks by successively minimizing a sequence of approximations of f which are either locally tight upper bounds of f or strictly convex local approximates of f. The convergence properties for a fairly wide class of such methods, especially for the cases where the objective functions are either non-differentiable or nonconvex.
Abstract: The block coordinate descent (BCD) method is widely used for minimizing a continuous function f of several block variables. At each iteration of this method, a single block of variables is optimized, while the remaining variables are held fixed. To ensure the convergence of the BCD method, the subproblem to be optimized in each iteration needs to be solved exactly to its unique optimal solution. Unfortunately, these requirements are often too restrictive for many practical scenarios. In this paper, we study an alternative inexact BCD approach which updates the variable blocks by successively minimizing a sequence of approximations of f which are either locally tight upper bounds of f or strictly convex local approximations of f. We focus on characterizing the convergence properties for a fairly wide class of such methods, especially for the cases where the objective functions are either non-differentiable or nonconvex. Our results unify and extend the existing convergence results for many classical algorithms such as the BCD method, the difference of convex functions (DC) method, the expectation maximization (EM) algorithm, as well as the alternating proximal minimization algorithm.

684 citations

Journal ArticleDOI
TL;DR: The state of the art in continuous optimization methods for such problems, and particular emphasis on optimal first-order schemes that can deal with typical non-smooth and large-scale objective functions used in imaging problems are described.
Abstract: A large number of imaging problems reduce to the optimization of a cost function , with typical structural properties. The aim of this paper is to describe the state of the art in continuous optimization methods for such problems, and present the most successful approaches and their interconnections. We place particular emphasis on optimal first-order schemes that can deal with typical non-smooth and large-scale objective functions used in imaging problems. We illustrate and compare the different algorithms using classical non-smooth problems in imaging, such as denoising and deblurring. Moreover, we present applications of the algorithms to more advanced problems, such as magnetic resonance imaging, multilabel image segmentation, optical flow estimation, stereo matching, and classification.

477 citations


Cites background from "Convergence of descent methods for ..."

  • ...In general, block coordinate (or Gauss–Seidel) descent schemes can be implemented in many ways, and many generalizations involving non-smooth terms are proved to converge in the literature (Grippo and Sciandrone 2000, Auslender 1976, Attouch et al. 2013)....

    [...]

  • ...…for The convergence of alternating minimizations or proximal (implicit) descent steps in this setting (which is not necessarily covered by the general approach of Tseng 2001) has been studied by Attouch et al. (2013), Attouch, Bolte, Redont and Soubeyran (2010) and Beck and Tetruashvili (2013)....

    [...]

References
More filters
Book
D.L. Donoho1
01 Jan 2004
TL;DR: It is possible to design n=O(Nlog(m)) nonadaptive measurements allowing reconstruction with accuracy comparable to that attainable with direct knowledge of the N most important coefficients, and a good approximation to those N important coefficients is extracted from the n measurements by solving a linear program-Basis Pursuit in signal processing.
Abstract: Suppose x is an unknown vector in Ropfm (a digital image or signal); we plan to measure n general linear functionals of x and then reconstruct. If x is known to be compressible by transform coding with a known transform, and we reconstruct via the nonlinear procedure defined here, the number of measurements n can be dramatically smaller than the size m. Thus, certain natural classes of images with m pixels need only n=O(m1/4log5/2(m)) nonadaptive nonpixel samples for faithful recovery, as opposed to the usual m pixel samples. More specifically, suppose x has a sparse representation in some orthonormal basis (e.g., wavelet, Fourier) or tight frame (e.g., curvelet, Gabor)-so the coefficients belong to an lscrp ball for 0

18,609 citations

Book
01 Jun 1970
TL;DR: In this article, the authors present a list of basic reference books for convergence of Minimization Methods in linear algebra and linear algebra with a focus on convergence under partial ordering.
Abstract: Preface to the Classics Edition Preface Acknowledgments Glossary of Symbols Introduction Part I. Background Material. 1. Sample Problems 2. Linear Algebra 3. Analysis Part II. Nonconstructive Existence Theorems. 4. Gradient Mappings and Minimization 5. Contractions and the Continuation Property 6. The Degree of a Mapping Part III. Iterative Methods. 7. General Iterative Methods 8. Minimization Methods Part IV. Local Convergence. 9. Rates of Convergence-General 10. One-Step Stationary Methods 11. Multistep Methods and Additional One-Step Methods Part V. Semilocal and Global Convergence. 12. Contractions and Nonlinear Majorants 13. Convergence under Partial Ordering 14. Convergence of Minimization Methods An Annotated List of Basic Reference Books Bibliography Author Index Subject Index.

7,669 citations

Book
01 Jan 1987
TL;DR: This book describes the first unified theory of polynomial-time interior-point methods, and describes several of the new algorithms described, e.g., the projective method, which have been implemented, tested on "real world" problems, and found to be extremely efficient in practice.
Abstract: Written for specialists working in optimization, mathematical programming, or control theory The general theory of path-following and potential reduction interior point polynomial time methods, interior point methods, interior point methods for linear and quadratic programming, polynomial time methods for nonlinear convex programming, efficient computation methods for control problems and variational inequalities, and acceleration of path-following methods are covered In this book, the authors describe the first unified theory of polynomial-time interior-point methods Their approach provides a simple and elegant framework in which all known polynomial-time interior-point methods can be explained and analyzed; this approach yields polynomial-time interior-point methods for a wide variety of problems beyond the traditional linear and quadratic programs The book contains new and important results in the general theory of convex programming, eg, their "conic" problem formulation in which duality theory is completely symmetric For each algorithm described, the authors carefully derive precise bounds on the computational effort required to solve a given family of problems to a given precision In several cases they obtain better problem complexity estimates than were previously known Several of the new algorithms described in this book, eg, the projective method, have been implemented, tested on "real world" problems, and found to be extremely efficient in practice Contents : Chapter 1: Self-Concordant Functions and Newton Method; Chapter 2: Path-Following Interior-Point Methods; Chapter 3: Potential Reduction Interior-Point Methods; Chapter 4: How to Construct Self- Concordant Barriers; Chapter 5: Applications in Convex Optimization; Chapter 6: Variational Inequalities with Monotone Operators; Chapter 7: Acceleration for Linear and Linearly Constrained Quadratic Problems

3,690 citations


"Convergence of descent methods for ..." refers background in this paper

  • ...metric regularity [2, 40, 41], cohypomonotonicity [48, 35], selfconcordance [47], partial smoothness [39, 56]....

    [...]

Journal ArticleDOI
TL;DR: The theory proposed here provides a taxonomy for numerical linear algebra algorithms that provide a top level mathematical view of previously unrelated algorithms and developers of new algorithms and perturbation theories will benefit from the theory.
Abstract: In this paper we develop new Newton and conjugate gradient algorithms on the Grassmann and Stiefel manifolds. These manifolds represent the constraints that arise in such areas as the symmetric eigenvalue problem, nonlinear eigenvalue problems, electronic structures computations, and signal processing. In addition to the new algorithms, we show how the geometrical framework gives penetrating new insights allowing us to create, understand, and compare algorithms. The theory proposed here provides a taxonomy for numerical linear algebra algorithms that provide a top level mathematical view of previously unrelated algorithms. It is our hope that developers of new algorithms and perturbation theories will benefit from the theory, methods, and examples in this paper.

2,686 citations


"Convergence of descent methods for ..." refers methods in this paper

  • ...Numerical analysis [50]: cone of positive semidefinite matrices, Stiefel manifold (spheres, orthogonal group [38]), matrices with fixed rank....

    [...]

Journal ArticleDOI
TL;DR: It is shown that various inverse problems in signal recovery can be formulated as the generic problem of minimizing the sum of two convex functions with certain regularity properties, which makes it possible to derive existence, uniqueness, characterization, and stability results in a unified and standardized fashion for a large class of apparently disparate problems.
Abstract: We show that various inverse problems in signal recovery can be formulated as the generic problem of minimizing the sum of two convex functions with certain regularity properties. This formulation makes it possible to derive existence, uniqueness, characterization, and stability results in a unified and standardized fashion for a large class of apparently disparate problems. Recent results on monotone operator splitting methods are applied to establish the convergence of a forward-backward algorithm to solve the generic problem. In turn, we recover, extend, and provide a simplified analysis for a variety of existing iterative methods. Applications to geometry/texture image decomposition schemes are also discussed. A novelty of our framework is to use extensively the notion of a proximity operator, which was introduced by Moreau in the 1960s.

2,645 citations


"Convergence of descent methods for ..." refers background or methods in this paper

  • ...This kind of structured problem occurs frequently, see for instance [24, 6] and Example 5....

    [...]

  • ...By applying the forward-backward splitting algorithm to this problem, we aim at finding a point which satisfies the hard constraints modelled by F , while the other constraints are satisfied in a possibly weaker sense (see [24] and references therein)....

    [...]

Frequently Asked Questions (14)
Q1. What contributions have the authors mentioned in the paper "Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized gauss-seidel methods" ?

In view of the minimization of a nonsmooth nonconvex function f, the authors prove an abstract convergence result for descent methods satisfying a sufficient-decrease assumption, and allowing a relative error tolerance. The specialization of their result to different kinds of structured problems provides several new convergence results for inexact versions of the gradient method, the proximal method, the forward-backward splitting algorithm, the gradient projection and some proximal regularization of the Gauss-Seidel method in a nonconvex setting. 

The computational implementation of the methods analyzed in this paper, as well as these stopping rules are topics for future research. 

If x0 is sufficiently close to ⋂pi=1 Fi, then the inexact averaged projection algorithm reduces to the gradient methodxk+1 = xk − θ p ∇f(xk) + ǫk,with f being given by (23), which therefore defines a unique sequence. 

Due to the fact that a convex function has at most one critical value, the bounded sequences generated by the above algorithms converge to a global minimizer. 

The first condition is intended to model a descent property: since it involves a measure of the quality of the descent, the authors call it a sufficientdecrease condition (see [20] for an early paper on this subject, [7] for an interpretation of this condition in decision sciences, and [43] for a discussion on this type of condition in a particular nonconvex nonsmooth optimization setting). 

In this paper, their central assumption for the study of such algorithms is that the function f satisfies the (nonsmooth) Kurdyka- Lojasiewicz inequality, which means, roughly speaking, that the functions under consideration are sharp up to a reparametrization (see Section 2.2). 

If (xk)k∈N is bounded, then it converges to a minimizer of f and the sequence of values f(xk) converges to the program value min f . 

In the context of optimization, the importance of Kurdyka- Lojasiewicz inequality is due to the fact that many problems involve functions satisfying such inequalities, and it is often elementary to check that such an inequality is satisfied; real semi-algebraic functions provide a very rich class of functions satisfying the Kurdyka- Lojasiewicz, see [5] for a thorough discussion on these aspects, and also Section 2.2 for a simple illustration. 

For each i in {1, . . . , p}, take a sequence of symmetric positive definite matrices (Aki )k∈N of size ni such that the eigenvalues of each A k i (k ∈ N, i ∈ {1, . . . , p}) lie in [λ, λ]. 

In these cases, over-solving the minimization subproblems would increase the computational burden of the method, and may slow down the final computation of a good approximation of the solution. 

To see that g + h is a KL function, the authors simply note that h is a polynomial function and that ‖ · ‖0 has a piecewise linear graph, hence the sum g + h is semi-algebraic. 

On the other hand, under-solving the minimization subproblems may result in a breakdown of the algorithm, and convergence to a solution may be lost. 

The subdifferential of f at x ∈ dom f , written ∂f(x), is defined as follows∂f(x) := {v ∈ Rn : ∃xk → x, f(xk) → f(x), vk ∈ ∂̂f(xk) → v}. 

When n = 1, the counting norm is denoted by | · |0; in that case one easily establishes thatproxγλ|·|0u = u if |u| > √2γλ {0, u} if |u| = √2γλ 0 otherwise.