scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A Descent Lemma Beyond Lipschitz Gradient Continuity: First-Order Methods Revisited and Applications

TL;DR: A framework which allows to circumvent the intricate question of Lipschitz continuity of gradients by using an elegant and easy to check convexity condition which captures the geometry of the constraints is introduced.
Abstract: The proximal gradient and its variants is one of the most attractive first-order algorithm for minimizing the sum of two convex functions, with one being nonsmooth. However, it requires the differentiable part of the objective to have a Lipschitz continuous gradient, thus precluding its use in many applications. In this paper we introduce a framework which allows to circumvent the intricate question of Lipschitz continuity of gradients by using an elegant and easy to check convexity condition which captures the geometry of the constraints. This condition translates into a new descent lemma which in turn leads to a natural derivation of the proximal-gradient scheme with Bregman distances. We then identify a new notion of asymmetry measure for Bregman distances, which is central in determining the relevant step-size. These novelties allow to prove a global sublinear rate of convergence, and as a by-product, global pointwise convergence is obtained. This provides a new path to a broad spectrum of problems arising in key applications which were, until now, considered as out of reach via proximal gradient methods. We illustrate this potential by showing how our results can be applied to build new and simple schemes for Poisson inverse problems.

Summary (2 min read)

Contribution and Outline

  • The methodology underlying their approach and leading to a proximal-based algorithm freed from Lipschitz gradient continuity is developed in Section 2.
  • A key player is a new simple, yet useful descent Lemma which allows to trade Lipschitz continuity of the gradient with an elementary convexity property.
  • In particular, an important notion of asymmetry coe cient is introduced and shown to play a central role in determining the relevant step size of the proposed scheme.
  • The method is presented in Section 3 and its analysis is developed in Section 4, where a sublinear O(1/k) rate of convergence is established without the traditional Lipschitz gradient continuity of the smooth function.
  • To demonstrate the benefits and potential of their new approach, the authors illustrate in Section 5 how it can be successfully applied to a broad class of Poisson linear inverse problems, leading to new proximal-based algorithms for these problems.

2.2. A New Descent Lemma Beyond Lipschitz Continuity

  • The use of Bregman distances in optimization within various contexts is well spread and cannot be reviewed here.
  • Many interesting results connecting for example Bregman proximal distance with dynamical systems can be found in [10] and references therein, and much more properties and applications can be found in the fundamental and comprehensive work [2] .

Clearly, D

  • The authors are ready to establish, the simple but key extended descent lemma.
  • It is easy to see that the condition (LC) admits various alternative reformulations which can facilitate its checking, and which the authors conveniently collect in the following.
  • Proposition 1. Consider the pair of functions (g, h) and assume that the above regularity conditions on h and g holds.
  • The proof easily follows from the definition of the Bregman distance and the usual convexity properties.
  • This holds with g(x) = x log x which does not have a Lipschitz gradient.

2.3. A Symmetry Measure for D h

  • Since h is strictly convex the objective in (11) may have at most one minimizer.
  • Assuming (i), the authors obtain that + i domh is coercive, since Bregman distances are nonnegative, the objective within T is also coercive and thus T is nonempty.
  • When assuming (ii), the argument follows by the supercoercivity properties of the same objective, see [4] .
  • It can be seen through the optimality condition for T (x) which implies that @h(T (x)) must be nonempty.

Example 2.

  • Note that this yields a nonseparable Bregman distance which is relevant for ball constraints.
  • Concerning NoLips the situation is exactly the same: for a given kernel h, sets and functions which are prox-friendly are scarce and are modeled on h.
  • However a major advantage in their approach is that one can choose the kernel h to adapt to the geometry of the given function/set.
  • This situation will be illustrated in Section 5, for a broad class of inverse problems involving Poisson noise.

Example 3.

  • Here the symmetry coe cient ↵(h) and the relative convexity constant L play central roles.
  • These issues are strongly related to the geometric features of h, but also to their adequation with the couple (f + g, dom h).

Assumptions H:

  • (ii) x. Remark 4. (a) All examples given previously, Boltzmann-Shannon, Fermi-Dirac, Hellinger Burg entropies satisfy the above set of assumptions.
  • For much more general and accurate results on the interplay between Legendre functions and Bregman separation properties on the boundary the authors refer the reader to [2] .
  • Theorem 1 recovers and extends the complexity/convergence results of [15, Theorem 3.4] .
  • Formally, the authors are dealing with linear inverse problems that can be conveniently described as follows.
  • This class of problems is su ciently broad to illustrate the theory and algorithm the authors have developed.

5.2. Two Simple Algorithms for Poisson Linear Inverse Problems

  • This work outlines in simple and transparent ways the basic ingredients to apply the proximal gradient methodology when the gradient of the smooth part in the composite model (P) is not Lipschitz continuous.
  • Thanks to a new and natural extension of the descent Lemma and a sharp definition of the step-size through the notion of symmetry, the authors have shown that NoLips shares convergence and complexity results akin to those of the usual proximal gradient.
  • The last section has illustrated the potential of the new proposed framework when applied to the key research area of linear inverse problems with Poisson noise which arises in image sciences.
  • On the theoretical side, their approach lays the ground for many new and promising perspectives for gradient-based methods that were not conceivable before.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Seediscussions,stats,andauthorprofilesforthispublicationat:https://www.researchgate.net/publication/308313905
ADescentLemmaBeyondLipschitzGradient
Continuity:First-OrderMethodsRevisitedand
Applications
ArticleinMathematicsofOperationsResearch·July2016
DOI:10.1287/moor.2016.0817
CITATIONS
16
READS
1,577
3authors,including:
Someoftheauthorsofthispublicationarealsoworkingontheserelatedprojects:
NonconvexOptimizationViewproject
JérômeBolte
ToulouseSchoolofEconomics
51PUBLICATIONS2,199CITATIONS
SEEPROFILE
AllcontentfollowingthispagewasuploadedbyJérômeBolteon19September2016.
Theuserhasrequestedenhancementofthedownloadedfile.

MATHEMATICS OF OPERATIONS RESEARCH
Vol. 00, No. 0, Xxxxx 0000, pp. 000000
issn 0364-765X | eissn 1526-5471 | 00 | 0000 | 0001
INFORMS
doi 10.1287/xxxx.0000.0000
c
0000 INFORMS
A descent Lemma beyond Lipschitz gradient
continuity: first-order methods revisited and
applications
Heinz H. Bauschke
Mathematics, University of British Columbia, Kelowna, B.C. V1V 1V7, Canada, heinz.bauschke@ubc.ca
J´erˆome Bolte
Toulouse School of Economics, Universit´e Toulouse Capitole, Manufacture des Tabacs, 21 all´ee de Brienne,
31015 Toulouse, France, jerome.bolte@ut-capitole.fr
Marc Teboulle
School of Mathematical Sciences, Tel Av iv University, Ramat Aviv 69978, Israel, teboulle@post.tau.ac.il
The proximal gradient and its variants is one of the most attractive first-order alg ori t h m for minimi zi n g
the sum of two convex functions, with on e being nonsmooth. However, it requires the dierentiable part of
the objective to have a Lipschitz continuous gradient, thus precluding its use in many applications. In this
paper we introd u ce a framework which allows to circumvent the intricate question of Lipschitz continuity
of gradients by using an eleg a nt and easy to check convexity condition which captures the geom et ry of t h e
constraints. This conditio n translates into a new descent Lemma which in turn leads to a natural de riva t i on of
the proximal-gradient scheme with Bregman distances. We then identify a new notion of asymmetry measure
for Bregman distances, which is central i n determining the releva nt step-size. These novelties allow to prove
a global subl in e a r rate of convergence, and as a by-product, global pointwise convergence is obtained. This
provides a new path to a broad spectrum of problems arising in key applications which were, until now,
considered as out of reach via proximal gradient methods. We illustrate this potential by showing how our
results can be applied to build new and simple schemes for Poisson inverse problems.
Key words : first-order methods, composite nonsmooth convex minimization, descent lemma,
proximal-gradient algorithms, comp l exi ty, Bregman distance, multiplicative Poisson linear inverse
problems
MSC2000 subject classification :90C25,65K05
OR/MS sub ject classification : Convex Programming/Algorithms
1. Introduction First-order methods have occupied the forefront of res ear ch in continuous
optimization for more than a dec ade . This is due to their wide applicab i li ty in a huge spectrum of
fundament al and disparate applications such as signal processing, image sciences, machine learn-
ing, communi cat i on sy st e ms, and ast r on omy to mention just a few, but also to their comput at ion al
simplicity which makes th em ideal method s for solving big data problems within medium accuracy
levels. Recent research activities in this field are still conducted at a furious path in al l the afore-
mentioned applic at ion s (and much more), as testified by the lar ge volume of literature; see e.g.,
[29, 34] and references th er ei n for an appetizer.
A fundamental generic optimization model that encompasses various classes of smooth/nonsmooth
convex models arising in the allud ed applications is the we l l known composite minimization prob-
lem which consists in minimizing the sum of a possibly nonsmooth extended valued function with
a dierentiable one over a real Eucli de an space X (see mor e precise description in §2) :
(P)inf{f( x)+g(x): x 2X}.
1
Accepted for publication in «!Math. of OR!», July 2016. Online soon

Bauschke, Bolte, and Teboulle: ADescentLemmaBeyondLipschitzContinuityforfirst-orderMethods
2 Mathematics of Op e r ati on s Research 00(0), pp. 000–000,
c
0000 INFORMS
Despite its str i ki n g simplicity, this model is very ri ch and has led to the developm ent of fundamental
and well known algori t h ms. A mother sche me is the so-called forward-backward splitting method,
which goes back at least to Passty [30] and Bruck [12] and which was developed in the more gene ral
setting of maximal monotone operators. W he n specialized to the convex problem (P), this method
is often called the proximal gradient method (PGM), a t er mi nol ogy we adopt in this article. O ne of
the earliest work describin g and analyzing the PGM includes for example the work of Fukushima
and Mil n e [21]. The more recent work by Combettes and Wajs [ 17] provides import ant foundational
insights and has popularized the met h od for a wide audience. More recently, the introduction of
fast ve rs i ons of the PGM such as FISTA by Beck-Teboull e [5] which extends the seminal and
fundament al work on the optimal gradi ent methods of Nesterov [27]– has resulted in a burst of
research activities.
A central property required in the analysis of gradient methods, like the PGM, is that of th e
Lipschit z continuity of the gradient of the smooth part. Such a property implies (for a convex
function is equivalent to) the so-called descent Lemma, e.g., [9], which provides a quadratic upper
approximat i on to the smooth part . This simple process is at the root of the proximal gradient
method, as well as many other methods. Howeve r, in many applications the dierentiable function
does not have such a property, e.g., in the broad class of Poisson inverse problems, (see e.g. the
recent review paper [8] which als o includes over 130 references), thus precluding therefore the use of
the PGM met hodology. When both f and g have an easily computable proximal operator, one could
also consider tackling the composite model (P) by applying the alternating direction of multipliers
ADM scheme [23]. For many problems, these schemes are known to be quite ecient. However,
note that even in simple cases, one faces several serious di cu l t i es that we now briefly recall. Firs t ,
being a primal-dual splitting method, the ADM scheme m ay considerably increase the dimension
of the problem (by the intr oduction of auxiliary splitting variables). Secondly, the method depends
on on e (or more) unknown penalty parameter that needs to be heuristically chosen. Finally, to
our k n owledge, the convergence rate results of ADM bas ed schemes are weaker, holding only for
primal-dual gap in terms of ergo di c sequences, see [14, 25, 33] and references therein. Moreover,
the complexity bound constant not only depends on the unknown penalty parameter, but also on
the norm of the matrix defining the splitting, wh ich in many applications can be huge.
The main goal of this pape r is to rectify this situati on . We introduce a framework which allows to
derive a class of proximal gradient based algorit h ms which are proven to share most of the conver-
gence properties and complexity of the classical proximal-gradient, yet where the usual restrictive
condition of Lipschitz continuity of th e gradient of the di e re ntiable part of problem (P) is not
required. It is in st e ad traded with a more general and flexible convexity conditi on which involves
the problem’s data and can be specified by the user for each given problem. This is a new path
to a broad spectrum of optimization models arising in key applications which were not accessible
before. Surprisingly, the derivation and the development of our results starts from a very simple
fact (which appear s to have been overlooked) which underlines that the main ingredient in the
success of PGM is to have an appropriate descent Lemma, i.e., an adequat e upper approximation
of the objective function.
Contribution and Outline The methodology underlying our approach and leading to a
proximal- bas ed algorithm freed from Lipschitz gradient conti nuity is developed in Section 2.Akey
player is a new simple, ye t useful descent Lemma which allows to trade Lipschitz continuity of
the gradient with an elementary convexity property. We further clarify these results by deriving
several properties and examples and high l ighting the key dierences with the traditional proximal
gradient method. In particular, an important notion of asymmetry coecient is introduced and
shown to play a central role in determining the relevant step size of the proposed sche me. The
method is presented in Section 3 and its analysis is developed in Section 4, where a sublin ear

Bauschke, Bolte, and Teboulle: ADescentLemmaBeyondLipschitzContinuityforfirst-orderMethods
Mathematics of Op erations Research 00(0), pp. 000–000,
c
0000 INFORMS 3
O(1/k) rate of convergence is est abl i sh ed without the traditional Lipschitz gradient continuity of
the smooth function. As a by-product, pointwise convergence of the method is also established .
To demonstrate the benefits and potential of ou r ne w app roach, we illustrate in Se ct i on 5 how
it can be successfully applied to a broad class of Poisson linear inverse prob l em s, leading to new
proximal- bas ed algorithms for these problems.
Notation Throughout the paper, the n ot at i on we employ is standard and as in [32] or [4]. We
recall t hat for any set C, i
C
(·) st and s for the usual indicator function, which is equal to 0 if x 2C
and 1 otherwise, and C denotes the closure of C.WesetR
++
=(0, +1).
2. A New Look at The Proximal Gradient Method We start by recalling th e basic
elements underlying the proximal gradient method and its analysis which motivates the forthcoming
developme nts.
Let X = R
d
be a real Euclidean space with inner product h·, ·i and induced norm k·k.Givena
closed convex set C with nonempty interior consider the convex problem
inf{(x):=f(x)+g(x):x 2C}
where f,g are proper, convex an d lower semicontinuous ( l sc) , with g continuously dierentiable on
int dom g 6= ;, (see later on below for a pre ci se description).
First consider the case when C = R
d
. For any fixed given point x 2 X and any >0, the
main step of the proximal gradient method consists in minimizing an upper approxima tio n of the
objective obtained by summing a quadratic majorant of the dierentiable part g and f,leaving
thus untouched t he nonsmooth part f of :
x
+
= arg min{g(x)+hrg(x),uxi+
1
2
ku xk
2
+ f(u): u 2R
d
}.
This is the proximal gradient algorithm, see e.g. [6]. Clearly, the minimizer x
+
exists and is unique,
and ignoring the constant terms in x reduces to
x
+
= arg min
u
{f( u)+
1
2
ku (x rg(x))k
2
}prox
f
(
x rg(x)
)
, (1)
where prox
'
(·) stands for the so cal l ed Moreau’s proximal map [26] of a proper lsc c onvex function
'. Thus, the PG scheme consists of the composition of a proximal (implicit/backward) step on f
with a gradient (explicit/forward) st e p of g.
1
A key assumption need ed in t h e very construction and in the anal ys i s of PG scheme is that
g admits a Lipschitz continuous gradient L
g
. As a simple consequence of this assumption (for a
convex function g, this is an equivalence), we obtain the so-called descent Lemma, see e.g., [9],
namely for any L L
g
,
g(x) g(y)+hx y,rg(y)i+
L
2
kx yk
2
, 8x, y 2R
d
. (2)
This inequality not only naturally provides a upper quadratic approximation of g, but is also a
crucial pillar in the analysis of any PG based method.
This leads us to the following simple observation:
1
This also can b e seen by convex calculus which gives 0 2 (@f (x
+
)+rg(x)+x
+
x), which is equivalent to
x
+
=(Id+@f)
1
(Id rg)(x) prox
f
(x r g(x)) .

Bauschke, Bolte, and Teboulle: ADescentLemmaBeyondLipschitzContinuityforfirst-orderMethods
4 Mathematics of Op e r ati on s Research 00(0), pp. 000–000,
c
0000 INFORMS
Main Observation Developing the squared norm in (2), simple algebra shows that it can be
equivalently writ t en as:
L
2
kxk
2
g(x)
L
2
kyk
2
g(y)
hLy rg(y),xyi8x, y 2R
d
,
which in turn is nothing else but the gradient inequality for the convex function
L
2
kxk
2
g(x).
Thus, for a given smooth convex function g on R
d
, the descent Lemma is equivalent to say that
L
2
kxk
2
g(x) is convex on R
d
.
This elementary and known fact, (see, e.g., [4, Theorem 18.15(vi)]) seems to have been overlooked.
It nat u ral l y suggests to consider, instead of the squared norm used for the unconstrained case
C = R
d
, a more general convex function that captures the geometry of the con st r ai nt C.This
provides the motivation for the forthc omi ng proximal gradient based algorithm and its analy s is for
the constrained composite problem (P).
2.1. The Constrained Composite Problem Our strategy to handle the constraint set C
is standard: a Legendre func t ion on C is chosen and its associated Bregman distance is used as a
proximity measure. Let us first re cal l the definition of a Lege nd r e function.
Definition 1 (Legendre functions). [32, Chapter 26] Let h : X !(1, 1] be a lsc proper
convex fu nc t ion . It is called:
(i) essentially smooth, if h is dierentiable on int dom h, with moreover krh(x
k
)k!1for every
sequence {x
k
}
k2N
intdom h converging to a boundary point of dom h as k !+1;
(ii) of Legendre type i f h is essentially smooth and strictly convex on int dom h.
Also, let us recall the useful fact that h is of Legendre type if and only if its conjugate h
is of
Legendre type. Moreover, the gradient of a Legendre function h is a bijection from int dom h to
int dom h
and its inverse is the gradient of the conjugate ([32, Thm 26. 5] , that is we have,
(rh)
1
= rh
and h
(rh(x)) = hx, rh(x)ih(x). (3)
Recall also that
dom @h = int dom h with @h(x)={rh(x)}, 8x 2intdom h. (4)
The Problem and Blanket A s su mpti ons Our aim is thus to solve
v(P)=inf{(x):=f(x)+g(x) |x 2dom h},
where dom h = C denotes the closure of dom h.
The following assumptions on the problem’s data are made through out the paper (and referr ed
to as the blanket assumptions) .
Assumption A
(i) f : X !(1, 1] is proper lower semicontinuous (lsc) convex,
(ii) h : X !(1, 1] is of Legendre type,
(iii) g : X ! (1, 1] is proper lsc convex with domg dom h, which is dierentiable on
int dom h,
(iv) dom f \int dom h 6= ;,
(v) 1<v(P)=inf{(x):x 2
dom h}=inf{(x):x 2dom h}.
Note that the second equality in (v) follows e.g. from [4, Proposition 11. 1( i v) ] and (iv) because
dom(f + g) \int dom h = dom f \i nt dom h 6= ?.

Citations
More filters
01 Jan 2016
TL;DR: The regularization of inverse problems is universally compatible with any devices to read and is available in the book collection an online access to it is set as public so you can download it instantly.
Abstract: Thank you for downloading regularization of inverse problems. Maybe you have knowledge that, people have search hundreds times for their favorite novels like this regularization of inverse problems, but end up in malicious downloads. Rather than reading a good book with a cup of tea in the afternoon, instead they juggled with some infectious bugs inside their computer. regularization of inverse problems is available in our book collection an online access to it is set as public so you can download it instantly. Our book servers spans in multiple locations, allowing you to get the most less latency time to download any of our books like this one. Kindly say, the regularization of inverse problems is universally compatible with any devices to read.

1,097 citations

Book
21 Feb 1970

986 citations

Journal ArticleDOI
TL;DR: In this article, the authors provide a reasonably comprehensive overview of this shift towards modern nonlinear regularization methods, including their analysis, applications and issues for future research, since they have attracted much recent interest and link to other fields, such as image processing and compressed sensing.
Abstract: Regularization methods are a key tool in the solution of inverse problems. They are used to introduce prior knowledge and allow a robust approximation of ill-posed (pseudo-) inverses. In the last two decades interest has shifted from linear to nonlinear regularization methods, even for linear inverse problems. The aim of this paper is to provide a reasonably comprehensive overview of this shift towards modern nonlinear regularization methods, including their analysis, applications and issues for future research.In particular we will discuss variational methods and techniques derived from them, since they have attracted much recent interest and link to other fields, such as image processing and compressed sensing. We further point to developments related to statistical inverse problems, multiscale decompositions and learning theory.

274 citations

Posted Content
TL;DR: This monograph introduces the basic concepts of Online Learning through a modern view of Online Convex Optimization, and presents first-order and second-order algorithms for online learning with convex losses, in Euclidean and non-Euclidean settings.
Abstract: In this monograph, I introduce the basic concepts of Online Learning through a modern view of Online Convex Optimization. Here, online learning refers to the framework of regret minimization under worst-case assumptions. I present first-order and second-order algorithms for online learning with convex losses, in Euclidean and non-Euclidean settings. All the algorithms are clearly presented as instantiation of Online Mirror Descent or Follow-The-Regularized-Leader and their variants. Particular attention is given to the issue of tuning the parameters of the algorithms and learning in unbounded domains, through adaptive and parameter-free online learning algorithms. Non-convex losses are dealt through convex surrogate losses and through randomization. The bandit setting is also briefly discussed, touching on the problem of adversarial and stochastic multi-armed bandits. These notes do not require prior knowledge of convex analysis and all the required mathematical tools are rigorously explained. Moreover, all the proofs have been carefully chosen to be as simple and as short as possible.

196 citations

References
More filters
Book
01 Mar 2004
TL;DR: In this article, the focus is on recognizing convex optimization problems and then finding the most appropriate technique for solving them, and a comprehensive introduction to the subject is given. But the focus of this book is not on the optimization problem itself, but on the problem of finding the appropriate technique to solve it.
Abstract: Convex optimization problems arise frequently in many different fields. A comprehensive introduction to the subject, this book shows in detail how such problems can be solved numerically with great efficiency. The focus is on recognizing convex optimization problems and then finding the most appropriate technique for solving them. The text contains many worked examples and homework exercises and will appeal to students, researchers and practitioners in fields such as engineering, computer science, mathematics, statistics, finance, and economics.

33,341 citations

Journal ArticleDOI
TL;DR: A new fast iterative shrinkage-thresholding algorithm (FISTA) which preserves the computational simplicity of ISTA but with a global rate of convergence which is proven to be significantly better, both theoretically and practically.
Abstract: We consider the class of iterative shrinkage-thresholding algorithms (ISTA) for solving linear inverse problems arising in signal/image processing. This class of methods, which can be viewed as an extension of the classical gradient algorithm, is attractive due to its simplicity and thus is adequate for solving large-scale problems even with dense matrix data. However, such methods are also known to converge quite slowly. In this paper we present a new fast iterative shrinkage-thresholding algorithm (FISTA) which preserves the computational simplicity of ISTA but with a global rate of convergence which is proven to be significantly better, both theoretically and practically. Initial promising numerical results for wavelet-based image deblurring demonstrate the capabilities of FISTA which is shown to be faster than ISTA by several orders of magnitude.

11,413 citations


"A Descent Lemma Beyond Lipschitz Gr..." refers methods in this paper

  • ...[10] J. Bolte and M. Teboulle, Barrier operators and associated gradient-like dynamical systems for constrained minimization problems, SIAM Journal on Control and Optimization 42 (2003), 1266–1292....

    [...]

  • ...[35] M. Teboulle, Entropic proximal mappings with application to nonlinear programming, Mathematics of Operations Research 17 (1992), 670–690....

    [...]

  • ...[33] R. Shefi and M. Teboulle....

    [...]

  • ...G. Chen and M. Teboulle, Journal of Mathematical Imaging and Vision, (2010), 1–26....

    [...]

  • ...[5] A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM Journal on Imaging Science 2 (2009), 183–202....

    [...]

01 Feb 1977

5,933 citations


"A Descent Lemma Beyond Lipschitz Gr..." refers methods in this paper

  • ...Notation Throughout the paper, the notation we employ is standard and as in [32] or [4]....

    [...]

Journal ArticleDOI
TL;DR: A first-order primal-dual algorithm for non-smooth convex optimization problems with known saddle-point structure can achieve O(1/N2) convergence on problems, where the primal or the dual objective is uniformly convex, and it can show linear convergence, i.e. O(ωN) for some ω∈(0,1), on smooth problems.
Abstract: In this paper we study a first-order primal-dual algorithm for non-smooth convex optimization problems with known saddle-point structure. We prove convergence to a saddle-point with rate O(1/N) in finite dimensions for the complete class of problems. We further show accelerations of the proposed algorithm to yield improved rates on problems with some degree of smoothness. In particular we show that we can achieve O(1/N 2) convergence on problems, where the primal or the dual objective is uniformly convex, and we can show linear convergence, i.e. O(? N ) for some ??(0,1), on smooth problems. The wide applicability of the proposed algorithm is demonstrated on several imaging problems such as image denoising, image deconvolution, image inpainting, motion estimation and multi-label image segmentation.

4,487 citations


"A Descent Lemma Beyond Lipschitz Gr..." refers background in this paper

  • ...Finally, to our knowledge, the convergence rate results of ADM based schemes are weaker, holding only for primal-dual gap in terms of ergodic sequences, see [14, 25, 33] and references therein....

    [...]