Showing papers by "Dmitriy Drusvyatskiy published in 2018"

PDF

Open Access

Journal Article•DOI•

Error Bounds, Quadratic Growth, and Linear Convergence of Proximal Methods

[...]

Dmitriy Drusvyatskiy¹, Adrian S. Lewis²•Institutions (2)

University of Washington¹, Cornell University²

15 Mar 2018-Mathematics of Operations Research

TL;DR: The proximal gradient algorithm for minimizing the sum of a smooth and nonsmooth convex function often converges linearly even without strong convexity as mentioned in this paper, and the equivalence of such an error bound to a natural quadratic growth condition is established.

...read moreread less

Abstract: The proximal gradient algorithm for minimizing the sum of a smooth and nonsmooth convex function often converges linearly even without strong convexity. One common reason is that a multiple of the step length at each iteration may linearly bound the “error”—the distance to the solution set. We explain the observed linear convergence intuitively by proving the equivalence of such an error bound to a natural quadratic growth condition. Our approach generalizes to linear and quadratic convergence analysis for proximal methods (of Gauss-Newton type) for minimizing compositions of nonsmooth functions with smooth mappings. We observe incidentally that short step-lengths in the algorithm indicate near-stationarity, suggesting a reliable termination criterion.

...read moreread less

235 citations

Posted Content•

Stochastic subgradient method converges on tame functions

[...]

Damek Davis¹, Dmitriy Drusvyatskiy², Sham M. Kakade², Jason D. Lee³•Institutions (3)

Cornell University¹, University of Washington², University of Southern California³

20 Apr 2018-arXiv: Optimization and Control

TL;DR: In particular, this article showed that the stochastic subgradient method on any locally Lipschitz function produces limit points that are all first-order stationary in the absence of smoothness and convexity.

...read moreread less

Abstract: This work considers the question: what convergence guarantees does the stochastic subgradient method have in the absence of smoothness and convexity? We prove that the stochastic subgradient method, on any semialgebraic locally Lipschitz function, produces limit points that are all first-order stationary. More generally, our result applies to any function with a Whitney stratifiable graph. In particular, this work endows the stochastic subgradient method, and its proximal extension, with rigorous convergence guarantees for a wide class of problems arising in data science---including all popular deep learning architectures.

...read moreread less

146 citations

Posted Content•

Stochastic subgradient method converges at the rate $O(k^{-1/4})$ on weakly convex functions

[...]

Damek Davis, Dmitriy Drusvyatskiy

08 Feb 2018-arXiv: Optimization and Control

TL;DR: It is proved that the projected stochastic subgradient method, applied to a weakly convex problem, drives the gradient of the Moreau envelope to zero at the rate $O(k^{-1/4})$.

...read moreread less

Abstract: We prove that the proximal stochastic subgradient method, applied to a weakly convex problem, drives the gradient of the Moreau envelope to zero at the rate $O(k^{-1/4})$ As a consequence, we resolve an open question on the convergence rate of the proximal stochastic gradient method for minimizing the sum of a smooth nonconvex function and a convex proximable function

...read moreread less

101 citations

Posted Content•

Stochastic model-based minimization of weakly convex functions

[...]

Damek Davis, Dmitriy Drusvyatskiy

17 Mar 2018-arXiv: Optimization and Control

TL;DR: For the stochastic proximal point, proximal subgradient, and regularized Gauss-Newton methods for minimizing compositions of convex functions with smooth maps, the authors showed that under reasonable conditions on approximation quality and regularity of the models, any such algorithm can drive a natural stationarity measure to zero at the rate O(k − 1/4 ).

...read moreread less

Abstract: We consider a family of algorithms that successively sample and minimize simple stochastic models of the objective function. We show that under reasonable conditions on approximation quality and regularity of the models, any such algorithm drives a natural stationarity measure to zero at the rate $O(k^{-1/4})$. As a consequence, we obtain the first complexity guarantees for the stochastic proximal point, proximal subgradient, and regularized Gauss-Newton methods for minimizing compositions of convex functions with smooth maps. The guiding principle, underlying the complexity guarantees, is that all algorithms under consideration can be interpreted as approximate descent methods on an implicit smoothing of the problem, given by the Moreau envelope. Specializing to classical circumstances, we obtain the long-sought convergence rate of the stochastic projected gradient method, without batching, for minimizing a smooth function on a closed convex set.

...read moreread less

95 citations

Posted Content•

Subgradient methods for sharp weakly convex functions

[...]

Damek Davis¹, Dmitriy Drusvyatskiy², Kellie J. MacPhee², Courtney Paquette³•Institutions (3)

Cornell University¹, University of Washington², Lehigh University³

06 Mar 2018-arXiv: Optimization and Control

TL;DR: This work shows that the same is true for sharp functions that are only weakly convex, provided that the subgradient methods are initialized within a fixed tube around the solution set.

...read moreread less

Abstract: Subgradient methods converge linearly on a convex function that grows sharply away from its solution set. In this work, we show that the same is true for sharp functions that are only weakly convex, provided that the subgradient methods are initialized within a fixed tube around the solution set. A variety of statistical and signal processing tasks come equipped with good initialization, and provably lead to formulations that are both weakly convex and sharp. Therefore, in such settings, subgradient methods can serve as inexpensive local search procedures. We illustrate the proposed techniques on phase retrieval and covariance estimation problems.

...read moreread less

51 citations

Proceedings Article•

Catalyst for Gradient-based Nonconvex Optimization

[...]

Courtney Paquette¹, Hongzhou Lin², Dmitriy Drusvyatskiy³, Julien Mairal, Zaid Harchaoui³ - Show less +1 more•Institutions (3)

Lehigh University¹, Massachusetts Institute of Technology², University of Washington³

31 Mar 2018

TL;DR: A generic scheme to solve non-convex optimization problems using gradient-based algorithms originally designed for minimizing convex functions by applying their approach to incremental algorithms such as SVRG and SAGA for sparse matrix factorization and for learning neural networks is introduced.

...read moreread less

Abstract: We introduce a generic scheme to solve non-convex optimization problems using gradient-based algorithms originally designed for minimizing convex functions. Even though these methods may originally require convexity to operate, the proposed approach allows one to use them without assuming any knowledge about the convexity of the objective. In general, the scheme is guaranteed to produce a stationary point with a worst-case efficiency typical of first-order methods, and when the objective turns out to be convex, it automatically accelerates in the sense of Nesterov and achieves near-optimal convergence rate in function values. We conclude the paper by showing promising experimental results obtained by applying our approach to incremental algorithms such as SVRG and SAGA for sparse matrix factorization and for learning neural networks.

...read moreread less

45 citations

Journal Article•DOI•

An Optimal First Order Method Based on Optimal Quadratic Averaging

[...]

Dmitriy Drusvyatskiy, Maryam Fazel, Scott Roy

01 Feb 2018-Siam Journal on Optimization

TL;DR: In this article, the same iterate sequence is generated by a scheme that in each iteration computes an optimal average of quadratic lower models of the function, which leads to a limited-memory extension with improved performance.

...read moreread less

Abstract: In a recent paper, Bubeck, Lee, and Singh introduced a new first order method for minimizing smooth strongly convex functions. Their geometric descent algorithm, largely inspired by the ellipsoid method, enjoys the optimal linear rate of convergence. We show that the same iterate sequence is generated by a scheme that in each iteration computes an optimal average of quadratic lower models of the function. Indeed, the minimum of the averaged quadratic approaches the true minimum at an optimal rate. This intuitive viewpoint reveals clear connections to the original fast-gradient methods and cutting plane ideas, and leads to limited-memory extensions with improved performance.

...read moreread less

44 citations

Journal Article•DOI•

Subgradient Methods for Sharp Weakly Convex Functions

[...]

Damek Davis¹, Dmitriy Drusvyatskiy², Kellie J. MacPhee², Courtney Paquette³•Institutions (3)

Cornell University¹, University of Washington², Lehigh University³

10 Sep 2018-Journal of Optimization Theory and Applications

TL;DR: In this paper, the authors show that subgradient methods converge linearly on a convex function that grows sharply away from its solution set, provided that the methods are initialized within a fixed tube around the solution set.

...read moreread less

Abstract: Subgradient methods converge linearly on a convex function that grows sharply away from its solution set. In this work, we show that the same is true for sharp functions that are only weakly convex, provided that the subgradient methods are initialized within a fixed tube around the solution set. A variety of statistical and signal processing tasks come equipped with good initialization and provably lead to formulations that are both weakly convex and sharp. Therefore, in such settings, subgradient methods can serve as inexpensive local search procedures. We illustrate the proposed techniques on phase retrieval and covariance estimation problems.

...read moreread less

42 citations

Journal Article•DOI•

Efficient Quadratic Penalization Through the Partial Minimization Technique

[...]

Aleksandr Y. Aravkin¹, Dmitriy Drusvyatskiy¹, Tristan van Leeuwen²•Institutions (2)

University of Washington¹, Utrecht University²

01 Jul 2018-IEEE Transactions on Automatic Control

TL;DR: In this paper, the authors show that the partial minimization technique regularizes the problem, making it well-conditioned, and they illustrate the theory and algorithms on boundary control, optimal transport, and parameter estimation for robust dynamic inference.

...read moreread less

Abstract: Common computational problems, such as parameter estimation in dynamic models and partial differential equation (PDE)-constrained optimization, require data fitting over a set of auxiliary parameters subject to physical constraints over an underlying state. Naive quadratically penalized formulations, commonly used in practice, suffer from inherent ill-conditioning. We show that surprisingly the partial minimization technique regularizes the problem, making it well-conditioned. This viewpoint sheds new light on variable projection techniques, as well as the penalty method for PDE-constrained optimization, and motivates robust extensions. In addition, we outline an inexact analysis, showing that the partial minimization subproblem can be solved very loosely in each iteration. We illustrate the theory and algorithms on boundary control, optimal transport, and parameter estimation for robust dynamic inference.

...read moreread less

27 citations

Posted Content•

Stochastic model-based minimization under high-order growth

[...]

Damek Davis, Dmitriy Drusvyatskiy, Kellie J. MacPhee

01 Jul 2018-arXiv: Optimization and Control

TL;DR: A scheme that iteratively sample and minimize stochastic convex models of the objective function drives a natural stationarity measure to zero at the rate of $O(k^{-1/4})$.

...read moreread less

Abstract: Given a nonsmooth, nonconvex minimization problem, we consider algorithms that iteratively sample and minimize stochastic convex models of the objective function. Assuming that the one-sided approximation quality and the variation of the models is controlled by a Bregman divergence, we show that the scheme drives a natural stationarity measure to zero at the rate $O(k^{-1/4})$. Under additional convexity and relative strong convexity assumptions, the function values converge to the minimum at the rate of $O(k^{-1/2})$ and $\widetilde{O}(k^{-1})$, respectively. We discuss consequences for stochastic proximal point, mirror descent, regularized Gauss-Newton, and saddle point algorithms.

...read moreread less

23 citations

Posted Content•

Graphical Convergence of Subgradients in Nonconvex Optimization and Learning.

[...]

Damek Davis¹, Dmitriy Drusvyatskiy²•Institutions (2)

Cornell University¹, University of Washington²

17 Oct 2018-arXiv: Optimization and Control

TL;DR: This work investigates the stochastic optimization problem of minimizing population risk, where the loss defining the risk is assumed to be weakly convex and establishes dimension-dependent rates on subgradient estimation in full generality and dimension-independent rates when the loss is a generalized linear model.

...read moreread less

Abstract: We investigate the stochastic optimization problem of minimizing population risk, where the loss defining the risk is assumed to be weakly convex. Compositions of Lipschitz convex functions with smooth maps are the primary examples of such losses. We analyze the estimation quality of such nonsmooth and nonconvex problems by their sample average approximations. Our main results establish dimension-dependent rates on subgradient estimation in full generality and dimension-independent rates when the loss is a generalized linear model. As an application of the developed techniques, we analyze the nonsmooth landscape of a robust nonlinear regression problem.

...read moreread less

Journal Article•DOI•

Foundations of Gauge and Perspective Duality

[...]

Aleksandr Y. Aravkin, James V. Burke, Dmitriy Drusvyatskiy, Michael P. Friedlander, Kellie J. MacPhee - Show less +1 more

30 Aug 2018-Siam Journal on Optimization

TL;DR: The foundations of gauge duality are revisited and it is demonstrated that it can be explained using a modern approach to duality based on a perturbation framework, and a direct proof that optimal solutions of the Fenchel-Rockafellar dual of the gauge dual are precisely the primal solutions rescaled by the optimal value.

...read moreread less

Abstract: We revisit the foundations of gauge duality and demonstrate that it can be explained using a modern approach to duality based on a perturbation framework. We therefore put gauge duality and Fenchel...

...read moreread less

Posted Content•

Complexity of finding near-stationary points of convex functions stochastically

[...]

Damek Davis, Dmitriy Drusvyatskiy

21 Feb 2018-arXiv: Optimization and Control

TL;DR: In this article, a stochastic subgradient method for minimizing a convex function with the improved rate of O(widetilde O(k − 1/2 ) was presented.

...read moreread less

Abstract: In a recent paper, we showed that the stochastic subgradient method applied to a weakly convex problem, drives the gradient of the Moreau envelope to zero at the rate $O(k^{-1/4})$. In this supplementary note, we present a stochastic subgradient method for minimizing a convex function, with the improved rate $\widetilde O(k^{-1/2})$.

...read moreread less

Posted Content•

Inexact alternating projections on nonconvex sets

[...]

Dmitriy Drusvyatskiy, Adrian S. Lewis

03 Nov 2018-arXiv: Optimization and Control

TL;DR: In this article, the authors show that computationally-cheap inexact projections may suffice instead of exact projection onto nonconvex sets, if one set is defined by sufficiently regular smooth constraints, then projecting onto the approximation obtained by linearizing those constraints around the current iterate suffices.

...read moreread less

Abstract: Given two arbitrary closed sets in Euclidean space, a simple transversality condition guarantees that the method of alternating projections converges locally, at linear rate, to a point in the intersection. Exact projection onto nonconvex sets is typically intractable, but we show that computationally-cheap inexact projections may suffice instead. In particular, if one set is defined by sufficiently regular smooth constraints, then projecting onto the approximation obtained by linearizing those constraints around the current iterate suffices. On the other hand, if one set is a smooth manifold represented through local coordinates, then the approximate projection resulting from linearizing the coordinate system around the preceding iterate on the manifold also suffices.

...read moreread less