Convex Analysis: (pms-28)

Home
/
Papers
/
Convex Analysis: (pms-28)

Book•

Convex Analysis: (pms-28)

21 Feb 1970-

About: The article was published on 1970-02-21 and is currently open access. It has received 986 citations till now. The article focuses on the topics: Convex analysis.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Rényi Divergence and Kullback-Leibler Divergence

[...]

Tim van Erven¹, Peter Harremos•Institutions (1)

University of Paris-Sud¹

12 Jun 2014-IEEE Transactions on Information Theory

TL;DR: In particular, the Renyi divergence of order 1 equals the Kullback-Leibler divergence as discussed by the authors, and the relation of the special order 0 to the Gaussian dichotomy and contiguity is discussed.

...read moreread less

Abstract: Renyi divergence is related to Renyi entropy much like Kullback-Leibler divergence is related to Shannon's entropy, and comes up in many settings. It was introduced by Renyi as a measure of information that satisfies almost the same axioms as Kullback-Leibler divergence, and depends on a parameter that is called its order. In particular, the Renyi divergence of order 1 equals the Kullback-Leibler divergence. We review and extend the most important properties of Renyi divergence and Kullback-Leibler divergence, including convexity, continuity, limits of $\sigma $ -algebras, and the relation of the special order 0 to the Gaussian dichotomy and contiguity. We also show how to generalize the Pythagorean inequality to orders different from 1, and we extend the known equivalence between channel capacity and minimax redundancy to continuous channel inputs (for all orders) and present several other minimax results.

...read moreread less

1,234 citations

Posted Content•

Coordinate Descent Algorithms

[...]

Stephen J. Wright¹•Institutions (1)

University of Wisconsin-Madison¹

17 Feb 2015-arXiv: Optimization and Control

TL;DR: Coordinate descent algorithms solve optimization problems by successively performing approximate minimization along coordinate directions or coordinate hyperplanes as mentioned in this paper, and they have been used in many applications, such as data analysis, machine learning, and other areas of current interest.

...read moreread less

Abstract: Coordinate descent algorithms solve optimization problems by successively performing approximate minimization along coordinate directions or coordinate hyperplanes. They have been used in applications for many years, and their popularity continues to grow because of their usefulness in data analysis, machine learning, and other areas of current interest. This paper describes the fundamentals of the coordinate descent approach, together with variants and extensions and their convergence properties, mostly with reference to convex objectives. We pay particular attention to a certain problem structure that arises frequently in machine learning applications, showing that efficient implementations of accelerated coordinate descent algorithms are possible for problems of this type. We also present some parallel variants and discuss their convergence properties under several models of parallel execution.

...read moreread less

659 citations

Journal Article•DOI•

Conic Optimization via Operator Splitting and Homogeneous Self-Dual Embedding

[...]

Brendan O'Donoghue¹, Eric Chu¹, Neal Parikh¹, Stephen Boyd¹•Institutions (1)

Stanford University¹

01 Jun 2016-Journal of Optimization Theory and Applications

TL;DR: In this article, the alternating directions method of multipliers is used to solve the homogeneous self-dual embedding, an equivalent feasibility problem involving finding a nonzero point in the intersection of a subspace and a cone.

...read moreread less

Abstract: We introduce a first-order method for solving very large convex cone programs. The method uses an operator splitting method, the alternating directions method of multipliers, to solve the homogeneous self-dual embedding, an equivalent feasibility problem involving finding a nonzero point in the intersection of a subspace and a cone. This approach has several favorable properties. Compared to interior-point methods, first-order methods scale to very large problems, at the cost of requiring more time to reach very high accuracy. Compared to other first-order methods for cone programs, our approach finds both primal and dual solutions when available or a certificate of infeasibility or unboundedness otherwise, is parameter free, and the per-iteration cost of the method is the same as applying a splitting method to the primal or dual alone. We discuss efficient implementation of the method in detail, including direct and indirect methods for computing projection onto the subspace, scaling the original problem data, and stopping criteria. We describe an open-source implementation, which handles the usual (symmetric) nonnegative, second-order, and semidefinite cones as well as the (non-self-dual) exponential and power cones and their duals. We report numerical results that show speedups over interior-point cone solvers for large problems, and scaling to very large general cone programs.

...read moreread less

597 citations

Journal Article•DOI•

A Descent Lemma Beyond Lipschitz Gradient Continuity: First-Order Methods Revisited and Applications

[...]

Heinz H. Bauschke¹, Jérôme Bolte², Marc Teboulle³•Institutions (3)

University of British Columbia¹, University of Toulouse², Tel Aviv University³

01 May 2017-Mathematics of Operations Research

TL;DR: A framework which allows to circumvent the intricate question of Lipschitz continuity of gradients by using an elegant and easy to check convexity condition which captures the geometry of the constraints is introduced.

...read moreread less

Abstract: The proximal gradient and its variants is one of the most attractive first-order algorithm for minimizing the sum of two convex functions, with one being nonsmooth. However, it requires the differentiable part of the objective to have a Lipschitz continuous gradient, thus precluding its use in many applications. In this paper we introduce a framework which allows to circumvent the intricate question of Lipschitz continuity of gradients by using an elegant and easy to check convexity condition which captures the geometry of the constraints. This condition translates into a new descent lemma which in turn leads to a natural derivation of the proximal-gradient scheme with Bregman distances. We then identify a new notion of asymmetry measure for Bregman distances, which is central in determining the relevant step-size. These novelties allow to prove a global sublinear rate of convergence, and as a by-product, global pointwise convergence is obtained. This provides a new path to a broad spectrum of problems arising in key applications which were, until now, considered as out of reach via proximal gradient methods. We illustrate this potential by showing how our results can be applied to build new and simple schemes for Poisson inverse problems.

...read moreread less

355 citations

Journal Article•DOI•

Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting

[...]

Jakub Konecny¹, Jie Liu², Peter Richtárik¹, Martin Takáč²•Institutions (2)

University of Edinburgh¹, Lehigh University²

01 Mar 2016-IEEE Journal of Selected Topics in Signal Processing

TL;DR: It is proved that as long as b is below a certain threshold, the authors can reach any predefined accuracy with less overall work than without mini-batching, and is suitable for further acceleration by parallelization.

...read moreread less

Abstract: We propose mS2GD: a method incorporating a mini-batching scheme for improving the theoretical complexity and practical performance of semi-stochastic gradient descent (S2GD). We consider the problem of minimizing a strongly convex function represented as the sum of an average of a large number of smooth convex functions, and a simple nonsmooth convex regularizer. Our method first performs a deterministic step (computation of the gradient of the objective function at the starting point), followed by a large number of stochastic steps. The process is repeated a few times with the last iterate becoming the new starting point. The novelty of our method is in introduction of mini-batching into the computation of stochastic steps. In each step, instead of choosing a single function, we sample $b$ functions, compute their gradients, and compute the direction based on this. We analyze the complexity of the method and show that it benefits from two speedup effects. First, we prove that as long as $b$ is below a certain threshold, we can reach any predefined accuracy with less overall work than without mini-batching. Second, our mini-batching scheme admits a simple parallel implementation, and hence is suitable for further acceleration by parallelization.

...read moreread less

289 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Rényi Divergence and Kullback-Leibler Divergence

[...]

Tim van Erven¹, Peter Harremos•Institutions (1)

University of Paris-Sud¹

12 Jun 2014-IEEE Transactions on Information Theory

...read moreread less

1,234 citations

Journal Article•DOI•

Coordinate descent algorithms

[...]

Stephen J. Wright¹•Institutions (1)

University of Wisconsin-Madison¹

01 Jun 2015-Mathematical Programming

TL;DR: A certain problem structure that arises frequently in machine learning applications is shown, showing that efficient implementations of accelerated coordinate descent algorithms are possible for problems of this type.

...read moreread less

1,198 citations

Journal Article•DOI•

A Descent Lemma Beyond Lipschitz Gradient Continuity: First-Order Methods Revisited and Applications

[...]

Heinz H. Bauschke¹, Jérôme Bolte², Marc Teboulle³•Institutions (3)

University of British Columbia¹, University of Toulouse², Tel Aviv University³

01 May 2017-Mathematics of Operations Research

...read moreread less

355 citations

Journal Article•DOI•

Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting

[...]

Jakub Konecny¹, Jie Liu², Peter Richtárik¹, Martin Takáč²•Institutions (2)

University of Edinburgh¹, Lehigh University²

01 Mar 2016-IEEE Journal of Selected Topics in Signal Processing

...read moreread less

289 citations

Journal Article•DOI•

DC programming and DCA: thirty years of developments

[...]

Hoai An Le Thi¹, Tao Pham Dinh•Institutions (1)

University of Lorraine¹

01 May 2018-Mathematical Programming

TL;DR: A short survey on thirty years of developments of DC (Difference of Convex functions) programming and DCA (DC Algorithms) which constitute the backbone of nonconvex programming and global optimization.

...read moreread less

Abstract: The year 2015 marks the 30th birthday of DC (Difference of Convex functions) programming and DCA (DC Algorithms) which constitute the backbone of nonconvex programming and global optimization. In this article we offer a short survey on thirty years of developments of these theoretical and algorithmic tools. The survey is comprised of three parts. In the first part we present a brief history of the field, while in the second we summarize the state-of-the-art results and recent advances. We focus on main theoretical results and DCA solvers for important classes of difficult nonconvex optimization problems, and then give an overview of real-world applications whose solution methods are based on DCA. The third part is devoted to new trends and important open issues, as well as suggestions for future developments.

...read moreread less

257 citations