scispace - formally typeset
Search or ask a question

Showing papers on "Rate of convergence published in 2017"


Journal ArticleDOI
TL;DR: New complexity bounds for methods of convex optimization based only on computation of the function value are proved, which appears that such methods usually need at most n times more iterations than the standard gradient methods, where n is the dimension of the space of variables.
Abstract: In this paper, we prove new complexity bounds for methods of convex optimization based only on computation of the function value. The search directions of our schemes are normally distributed random Gaussian vectors. It appears that such methods usually need at most n times more iterations than the standard gradient methods, where n is the dimension of the space of variables. This conclusion is true for both nonsmooth and smooth problems. For the latter class, we present also an accelerated scheme with the expected rate of convergence $$O\Big ({n^2 \over k^2}\Big )$$O(n2k2), where k is the iteration counter. For stochastic optimization, we propose a zero-order scheme and justify its expected rate of convergence $$O\Big ({n \over k^{1/2}}\Big )$$O(nk1/2). We give also some bounds for the rate of convergence of the random gradient-free methods to stationary points of nonconvex functions, for both smooth and nonsmooth cases. Our theoretical results are supported by preliminary computational experiments.

859 citations


Journal ArticleDOI
TL;DR: This paper introduces a distributed algorithm, referred to as DIGing, based on a combination of a distributed inexact gradient method and a gradient tracking technique that converges to a global and consensual minimizer over time-varying graphs.
Abstract: This paper considers the problem of distributed optimization over time-varying graphs. For the case of undirected graphs, we introduce a distributed algorithm, referred to as DIGing, based on a combination of a distributed inexact gradient method and a gradient tracking technique. The DIGing algorithm uses doubly stochastic mixing matrices and employs fixed step-sizes and, yet, drives all the agents' iterates to a global and consensual minimizer. When the graphs are directed, in which case the implementation of doubly stochastic mixing matrices is unrealistic, we construct an algorithm that incorporates the push-sum protocol into the DIGing structure, thus obtaining the Push-DIGing algorithm. Push-DIGing uses column stochastic matrices and fixed step-sizes, but it still converges to a global and consensual minimizer. Under the strong convexity assumption, we prove that the algorithms converge at R-linear (geometric) rates as long as the step-sizes do not exceed some upper bounds. We establish explicit est...

795 citations


Journal ArticleDOI
TL;DR: In this paper, the stochastic average gradient (SAG) method is used to optimize the sum of a finite number of smooth convex functions, which achieves a faster convergence rate than black-box SG methods.
Abstract: We analyze the stochastic average gradient (SAG) method for optimizing the sum of a finite number of smooth convex functions. Like stochastic gradient (SG) methods, the SAG method's iteration cost is independent of the number of terms in the sum. However, by incorporating a memory of previous gradient values the SAG method achieves a faster convergence rate than black-box SG methods. The convergence rate is improved from $$O(1/\sqrt{k})$$O(1/k) to O(1 / k) in general, and when the sum is strongly-convex the convergence rate is improved from the sub-linear O(1 / k) to a linear convergence rate of the form $$O(\rho ^k)$$O(?k) for $$\rho < 1$$?<1. Further, in many cases the convergence rate of the new method is also faster than black-box deterministic gradient methods, in terms of the number of gradient evaluations. This extends our earlier work Le Roux et al. (Adv Neural Inf Process Syst, 2012), which only lead to a faster rate for well-conditioned strongly-convex problems. Numerical experiments indicate that the new algorithm often dramatically outperforms existing SG and deterministic gradient methods, and that the performance may be further improved through the use of non-uniform sampling strategies.

769 citations


Journal ArticleDOI
TL;DR: This paper establishes the global R-linear convergence of the ADMM for minimizing the sum of any number of convex separable functions, assuming that a certain error bound condition holds true and the dual stepsize is sufficiently small.
Abstract: We analyze the convergence rate of the alternating direction method of multipliers (ADMM) for minimizing the sum of two or more nonsmooth convex separable functions subject to linear constraints. Previous analysis of the ADMM typically assumes that the objective function is the sum of only two convex functions defined on two separable blocks of variables even though the algorithm works well in numerical experiments for three or more blocks. Moreover, there has been no rate of convergence analysis for the ADMM without strong convexity in the objective function. In this paper we establish the global R-linear convergence of the ADMM for minimizing the sum of any number of convex separable functions, assuming that a certain error bound condition holds true and the dual stepsize is sufficiently small. Such an error bound condition is satisfied for example when the feasible set is a compact polyhedron and the objective function consists of a smooth strictly convex function composed with a linear mapping, and a nonsmooth $$\ell _1$$l1 regularizer. This result implies the linear convergence of the ADMM for contemporary applications such as LASSO without assuming strong convexity of the objective function.

705 citations


Journal ArticleDOI
TL;DR: The final convergence result shows clearly how the regularity of the solution and the grading of the mesh affect the order of convergence of the difference scheme, so one can choose an optimal mesh grading.
Abstract: A reaction-diffusion problem with a Caputo time derivative of order $\alpha\in (0,1)$ is considered. The solution of such a problem is shown in general to have a weak singularity near the initial time $t=0$, and sharp pointwise bounds on certain derivatives of this solution are derived. A new analysis of a standard finite difference method for the problem is given, taking into account this initial singularity. This analysis encompasses both uniform meshes and meshes that are graded in time, and includes new stability and consistency bounds. The final convergence result shows clearly how the regularity of the solution and the grading of the mesh affect the order of convergence of the difference scheme, so one can choose an optimal mesh grading. Numerical results are presented that confirm the sharpness of the error analysis.

573 citations


Proceedings Article
06 Aug 2017
TL;DR: In this article, the authors show that perturbed gradient descent can escape saddle points almost for free, in a number of iterations which depends only poly-logarithmically on dimension.
Abstract: This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension (i.e., it is almost "dimension-free"). The convergence rate of this procedure matches the well-known convergence rate of gradient descent to first-order stationary points, up to log factors. When all saddle points are non-degenerate, all second-order stationary points are local minima, and our result thus shows that perturbed gradient descent can escape saddle points almost for free. Our results can be directly applied to many machine learning applications, including deep learning. As a particular concrete example of such an application, we show that our results can be used directly to establish sharp global convergence rates for matrix factorization. Our results rely on a novel characterization of the geometry around saddle points, which may be of independent interest to the non-convex optimization community.

280 citations


Journal ArticleDOI
TL;DR: If each agent is asymptotically null controllable with bounded controls and the interaction topology described by a signed digraph is structurally balanced and contains a spanning tree, then the semi-global bipartite consensus can be achieved for the linear multiagent system by a linear feedback controller with the control gain being designed via the low gain feedback technique.
Abstract: The bipartite consensus problem for a group of homogeneous generic linear agents with input saturation under directed interaction topology is examined. It is established that if each agent is asymptotically null controllable with bounded controls and the interaction topology described by a signed digraph is structurally balanced and contains a spanning tree, then the semi-global bipartite consensus can be achieved for the linear multiagent system by a linear feedback controller with the control gain being designed via the low gain feedback technique. The convergence analysis of the proposed control strategy is performed by means of the Lyapunov method which can also specify the convergence rate. At last, the validity of the theoretical findings is demonstrated by two simulation examples.

272 citations


Journal ArticleDOI
TL;DR: In this article, an algorithm for non-convex optimization with global convergence to a critical point has been proposed, where the variables of the underlying problem are either treated as one block or multiple disjoint blocks.
Abstract: Nonconvex optimization arises in many areas of computational science and engineering. However, most nonconvex optimization algorithms are only known to have local convergence or subsequence convergence properties. In this paper, we propose an algorithm for nonconvex optimization and establish its global convergence (of the whole sequence) to a critical point. In addition, we give its asymptotic convergence rate and numerically demonstrate its efficiency. In our algorithm, the variables of the underlying problem are either treated as one block or multiple disjoint blocks. It is assumed that each non-differentiable component of the objective function, or each constraint, applies only to one block of variables. The differentiable components of the objective function, however, can involve multiple blocks of variables together. Our algorithm updates one block of variables at a time by minimizing a certain prox-linear surrogate, along with an extrapolation to accelerate its convergence. The order of update can be either deterministically cyclic or randomly shuffled for each cycle. In fact, our convergence analysis only needs that each block be updated at least once in every fixed number of iterations. We show its global convergence (of the whole sequence) to a critical point under fairly loose conditions including, in particular, the Kurdyka–Łojasiewicz condition, which is satisfied by a broad class of nonconvex/nonsmooth applications. These results, of course, remain valid when the underlying problem is convex. We apply our convergence results to the coordinate descent iteration for non-convex regularized linear regression, as well as a modified rank-one residue iteration for nonnegative matrix factorization. We show that both applications have global convergence. Numerically, we tested our algorithm on nonnegative matrix and tensor factorization problems, where random shuffling clearly improves the chance to avoid low-quality local solutions.

259 citations


Posted Content
TL;DR: This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension, which shows that perturbed gradient descent can escape saddle points almost for free.
Abstract: This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension (i.e., it is almost "dimension-free"). The convergence rate of this procedure matches the well-known convergence rate of gradient descent to first-order stationary points, up to log factors. When all saddle points are non-degenerate, all second-order stationary points are local minima, and our result thus shows that perturbed gradient descent can escape saddle points almost for free. Our results can be directly applied to many machine learning applications, including deep learning. As a particular concrete example of such an application, we show that our results can be used directly to establish sharp global convergence rates for matrix factorization. Our results rely on a novel characterization of the geometry around saddle points, which may be of independent interest to the non-convex optimization community.

259 citations


Posted Content
TL;DR: The present work provides a nonasymptotic analysis in the context of non-convex learning problems, giving finite-time guarantees for SGLD to find approximate minimizers of both empirical and population risks.
Abstract: Stochastic Gradient Langevin Dynamics (SGLD) is a popular variant of Stochastic Gradient Descent, where properly scaled isotropic Gaussian noise is added to an unbiased estimate of the gradient at each iteration. This modest change allows SGLD to escape local minima and suffices to guarantee asymptotic convergence to global minimizers for sufficiently regular non-convex objectives (Gelfand and Mitter, 1991). The present work provides a nonasymptotic analysis in the context of non-convex learning problems, giving finite-time guarantees for SGLD to find approximate minimizers of both empirical and population risks. As in the asymptotic setting, our analysis relates the discrete-time SGLD Markov chain to a continuous-time diffusion process. A new tool that drives the results is the use of weighted transportation cost inequalities to quantify the rate of convergence of SGLD to a stationary distribution in the Euclidean $2$-Wasserstein distance.

251 citations


Proceedings ArticleDOI
19 Jun 2017
TL;DR: Katyusha as discussed by the authors is a primal-only stochastic gradient method with negative momentum on top of Nesterov's momentum that can be incorporated into a variance reduction based algorithm and speed it up.
Abstract: Nesterov's momentum trick is famously known for accelerating gradient descent, and has been proven useful in building fast iterative algorithms. However, in the stochastic setting, counterexamples exist and prevent Nesterov's momentum from providing similar acceleration, even if the underlying problem is convex. We introduce Katyusha, a direct, primal-only stochastic gradient method to fix this issue. It has a provably accelerated convergence rate in convex (off-line) stochastic optimization. The main ingredient is Katyusha momentum, a novel "negative momentum" on top of Nesterov's momentum that can be incorporated into a variance-reduction based algorithm and speed it up. Since variance reduction has been successfully applied to a growing list of practical problems, our paper suggests that in each of such cases, one could potentially give Katyusha a hug.

Journal ArticleDOI
TL;DR: In this article, a sampling technique based on the Euler discretization of the Langevin stochastic differential equation is studied, and for both constant and decreasing step sizes, non-asymptotic bounds for the convergence to stationarity in both total variation and Wasserstein distances are obtained.
Abstract: Sampling distribution over high-dimensional state-space is a problem which has recently attracted a lot of research efforts; applications include Bayesian non-parametrics, Bayesian inverse problems and aggregation of estimators. All these problems boil down to sample a target distribution $\pi$ having a density \wrt\ the Lebesgue measure on $\mathbb{R}^d$, known up to a normalisation factor $x \mapsto \mathrm{e}^{-U(x)}/\int_{\mathbb{R}^d} \mathrm{e}^{-U(y)} \mathrm{d} y$ where $U$ is continuously differentiable and smooth. In this paper, we study a sampling technique based on the Euler discretization of the Langevin stochastic differential equation. Contrary to the Metropolis Adjusted Langevin Algorithm (MALA), we do not apply a Metropolis-Hastings correction. We obtain for both constant and decreasing step sizes in the Euler discretization, non-asymptotic bounds for the convergence to stationarity in both total variation and Wasserstein distances. A particular attention is paid on the dependence on the dimension of the state space, to demonstrate the applicability of this method in the high dimensional setting, at least when $U$ is convex. These bounds are based on recently obtained estimates of the convergence of the Langevin diffusion to stationarity using Poincar{\'e} and log-Sobolev inequalities. These bounds improve and extend the results of (Dalalyan, 2014). We also investigate the convergence of an appropriately weighted empirical measure and we report sharp bounds for the mean square error and exponential deviation inequality for Lipschitz functions. A limited Monte Carlo experiment is carried out to support our findings.

Journal Article
TL;DR: This work proposes a stochastic primal-dual coordinate method, which alternates between maximizing over one (or more) randomly chosen dual variable and minimizing over the primal variable, and develops an extension to non-smooth and nonstrongly convex loss functions.
Abstract: We consider a generic convex optimization problem associated with regularized empirical risk minimization of linear predictors. The problem structure allows us to reformulate it as a convex-concave saddle point problem. We propose a stochastic primal-dual coordinate (SPDC) method, which alternates between maximizing over a randomly chosen dual variable and minimizing over the primal variables. An extrapolation step on the primal variables is performed to obtain accelerated convergence rate. We also develop a mini-batch version of the SPDC method which facilitates parallel computing, and an extension with weighted sampling probabilities on the dual variables, which has a better complexity than uniform sampling on unnormalized data. Both theoretically and empirically, we show that the SPDC method has comparable or better performance than several state-of-the-art optimization methods.

Journal ArticleDOI
TL;DR: In this article, the convergence rate bound for Douglas-Rachford splitting and ADMM under strong convexity and smoothness assumptions is shown. And the convergence bound is tight for the class of problems under consideration for all feasible algorithm parameters.
Abstract: Recently, several convergence rate results for Douglas-Rachford splitting and the alternating direction method of multipliers (ADMM) have been presented in the literature. In this paper, we show global linear convergence rate bounds for Douglas-Rachford splitting and ADMM under strong convexity and smoothness assumptions. We further show that the rate bounds are tight for the class of problems under consideration for all feasible algorithm parameters. For problems that satisfy the assumptions, we show how to select step-size and metric for the algorithm that optimize the derived convergence rate bounds. For problems with a similar structure that do not satisfy the assumptions, we present heuristic step-size and metric selection methods.

Journal ArticleDOI
TL;DR: Semi-Stochastic Gradient Descent (S2GD) as mentioned in this paper runs for one or several epochs in each of which a single full gradient and a random number of stochastic gradients is computed, following a geometric law.
Abstract: In this paper we study the problem of minimizing the average of a large number of smooth convex loss functions. We propose a new method, S2GD (Semi-Stochastic Gradient Descent), which runs for one or several epochs in each of which a single full gradient and a random number of stochastic gradients is computed, following a geometric law. For strongly convex objectives, the method converges linearly. The total work needed for the method to output an epsilon-accurate solution in expectation, measured in the number of passes over data, is proportional to the condition number of the problem and inversely proportional to the number of functions forming the average. This is achieved by running the method with number of stochastic gradient evaluations per epoch proportional to conditioning of the problem. The SVRG method of Johnson and Zhang arises as a special case. To illustrate our theoretical results, S2GD only needs the workload equivalent to about 2.1 full gradient evaluations to find a 10e-6 accurate solution for a problem with 10e9 functions and a condition number of 10e3.

Journal ArticleDOI
TL;DR: In this article, a differentially private Laplacian consensus algorithm was proposed for the multi-agent average consensus problem under the requirement of differential privacy of the agents initial states against an adversary that has access to all the messages.

Journal ArticleDOI
TL;DR: In this article, the integral version of the Dirichlet homogeneous fractional Laplace equation is considered and the optimal order of convergence for the standard linear finite element method is proved for quasi-uniform as well as graded meshes.
Abstract: This paper deals with the integral version of the Dirichlet homogeneous fractional Laplace equation. For this problem weighted and fractional Sobolev a priori estimates are provided in terms of the Holder regularity of the data. By relying on these results, optimal order of convergence for the standard linear finite element method is proved for quasi-uniform as well as graded meshes. Some numerical examples are given showing results in agreement with the theoretical predictions.

Journal ArticleDOI
TL;DR: Correct correction formulas at the starting steps of the BDF convolution quadrature for discretizing evolution equations are developed to restore the desired th-order convergence rate.
Abstract: We develop proper correction formulas at the starting $k-1$ steps to restore the desired $k$th-order convergence rate of the $k$-step BDF convolution quadrature for discretizing evolution equations...

Journal ArticleDOI
TL;DR: A new distributed algorithm based on alternating direction method of multipliers (ADMM) to minimize sum of locally known convex functions using communication over a network and highlights the effect of network and communication weights on the convergence rate through degrees of the nodes, the smallest nonzero eigenvalue, and operator norm of the communication matrix.
Abstract: We propose a new distributed algorithm based on alternating direction method of multipliers (ADMM) to minimize sum of locally known convex functions using communication over a network. This optimization problem emerges in many applications in distributed machine learning and statistical estimation. Our algorithm allows for a general choice of the communication weight matrix, which is used to combine the iterates at different nodes. We show that when functions are convex, both the objective function values and the feasibility violation converge with rate $O(1/T)$ , where $T$ is the number of iterations. We then show that when functions are strongly convex and have Lipschitz continuous gradients, the sequence generated by our algorithm converges linearly to the optimal solution. In particular, an $\epsilon$ -optimal solution can be computed with $O\left(\sqrt{\kappa _f} \log (1/\epsilon) \right)$ iterations, where $\kappa _f$ is the condition number of the problem. Our analysis highlights the effect of network and communication weights on the convergence rate through degrees of the nodes, the smallest nonzero eigenvalue, and operator norm of the communication matrix.

Journal ArticleDOI
TL;DR: In this article, the authors consider the problem of distributed learning where a network of agents collectively aim to agree on a hypothesis that best explains a set of distributed observations of conditionally independent random processes.
Abstract: We consider the problem of distributed learning , where a network of agents collectively aim to agree on a hypothesis that best explains a set of distributed observations of conditionally independent random processes. We propose a distributed algorithm and establish consistency, as well as a nonasymptotic, explicit, and geometric convergence rate for the concentration of the beliefs around the set of optimal hypotheses. Additionally, if the agents interact over static networks, we provide an improved learning protocol with better scalability with respect to the number of nodes in the network.

Journal ArticleDOI
TL;DR: Considering the fact that the infrared small target is always brighter than its adjacent background, an additional non-negative constraint to the sparse target patch-image is proposed, which could not only wipe off more undesirable components ulteriorly but also accelerate the convergence rate.

Journal ArticleDOI
TL;DR: A novel, linear, second order semi-discrete scheme in time to solve the governing system of equations in the hydrodynamic Q -tensor model, developed following the novel ‘ energy quadratization ’ strategy so that it is linear and unconditionally energy stable at the semi- Discrete level.

Journal ArticleDOI
TL;DR: This paper investigates the recursive parameter and state estimation algorithms for a special class of nonlinear systems (i.e., bilinear state space systems) by using the gradient search and proposes a state observer-based stochastic gradient algorithm and three algorithms derived by means of the multi-innovation theory.
Abstract: This paper investigates the recursive parameter and state estimation algorithms for a special class of nonlinear systems (i.e., bilinear state space systems). A state observer-based stochastic gradient (O-SG) algorithm is presented for the bilinear state space systems by using the gradient search. In order to improve the parameter estimation accuracy and the convergence rate of the O-SG algorithm, a state observer-based multi-innovation stochastic gradient algorithm and a state observer-based recursive least squares identification algorithm are derived by means of the multi-innovation theory. Finally, a numerical example is provided to demonstrate the effectiveness of the proposed algorithms.

Proceedings Article
18 Jun 2017
TL;DR: In this article, a nonasymptotic analysis of stochastic gradient Langevin dynamics (SGLD) is provided for non-convex learning problems, giving finite-time guarantees for SGLD to find approximate minimizers of both empirical and population risks.
Abstract: Stochastic Gradient Langevin Dynamics (SGLD) is a popular variant of Stochastic Gradient Descent, where properly scaled isotropic Gaussian noise is added to an unbiased estimate of the gradient at each iteration. This modest change allows SGLD to escape local minima and suffices to guarantee asymptotic convergence to global minimizers for sufficiently regular non-convex objectives (Gelfand and Mitter, 1991). The present work provides a nonasymptotic analysis in the context of non-convex learning problems, giving finite-time guarantees for SGLD to find approximate minimizers of both empirical and population risks. As in the asymptotic setting, our analysis relates the discrete-time SGLD Markov chain to a continuous-time diffusion process. A new tool that drives the results is the use of weighted transportation cost inequalities to quantify the rate of convergence of SGLD to a stationary distribution in the Euclidean $2$-Wasserstein distance.

Journal ArticleDOI
Peng Hao1, Yutian Wang1, Chen Liu1, Bo Wang1, Hao Wu 
TL;DR: In this paper, an efficient and robust algorithm of non-probabilistic reliability-based design optimization (NRBDO) is proposed based on convex model, where the inner loop concerns a Min-max problem for the evaluation of reliability index, and an enhanced chaos control (ECC) method is developed on the basis of chaotic dynamics theory.

Posted Content
TL;DR: In this paper, the authors study several classes of stochastic optimization algorithms enriched with heavy ball momentum and prove global nonassymptotic linear convergence rates for all methods and various measures of success.
Abstract: In this paper we study several classes of stochastic optimization algorithms enriched with heavy ball momentum. Among the methods studied are: stochastic gradient descent, stochastic Newton, stochastic proximal point and stochastic dual subspace ascent. This is the first time momentum variants of several of these methods are studied. We choose to perform our analysis in a setting in which all of the above methods are equivalent. We prove global nonassymptotic linear convergence rates for all methods and various measures of success, including primal function values, primal iterates (in L2 sense), and dual function values. We also show that the primal iterates converge at an accelerated linear rate in the L1 sense. This is the first time a linear rate is shown for the stochastic heavy ball method (i.e., stochastic gradient descent method with momentum). Under somewhat weaker conditions, we establish a sublinear convergence rate for Cesaro averages of primal iterates. Moreover, we propose a novel concept, which we call stochastic momentum, aimed at decreasing the cost of performing the momentum step. We prove linear convergence of several stochastic methods with stochastic momentum, and show that in some sparse data regimes and for sufficiently small momentum parameters, these methods enjoy better overall complexity than methods with deterministic momentum. Finally, we perform extensive numerical testing on artificial and real datasets, including data coming from average consensus problems.

Journal ArticleDOI
TL;DR: A Fourier pseudo-spectral method that conserves mass and energy is developed for a two-dimensional nonlinear Schrodinger equation and it is proved that the optimal rate of convergence is in the order of O in the discrete L 2 norm without any restrictions on the grid ratio.

Journal ArticleDOI
TL;DR: The proposed Newton-based extremum seeking approach removes the dependence of the convergence rate on the unknown Hessian of the nonlinear map to be optimized, being user-assignable as in the literature free of delays.
Abstract: In this paper, we address the design and analysis of multi-variable extremum seeking for static maps subject to arbitrarily long time delays Both Gradient and Newton-based methods are considered Multi-input systems with different time delays in each individual input channel as well as output delays are dealt with The phase compensation of the dither signals and the inclusion of predictor feedback with a perturbation-based (averaging-based) estimate of the Hessian allow to obtain local exponential convergence results to a small neighborhood of the optimal point, even in the presence of delays The stability analysis is carried out using backstepping transformation and averaging in infinite dimensions, capturing the infinite-dimensional state due the time delay In particular, a new backstepping-like transformation is introduced to design the predictor for the Gradient-based extremum seeking scheme with multiple and distinct input delays The proposed Newton-based extremum seeking approach removes the dependence of the convergence rate on the unknown Hessian of the nonlinear map to be optimized, being user-assignable as in the literature free of delays A source seeking example illustrates the performance of the proposed delay-compensated extremum seeking schemes

Journal ArticleDOI
TL;DR: The analysis indicates that the parameter estimates given by the proposed algorithms converge to their true values under the persistent excitation conditions.

Journal ArticleDOI
TL;DR: The linear rate convergence of the alternating direction method of multipliers (ADMM) for solving linearly constrained convex composite optimization problems is proved and the usefulness of the obtained results when applied to two- and multi-block convex quadratic (semidefinite) programming.
Abstract: In this paper, we aim to prove the linear rate convergence of the alternating direction method of multipliers (ADMM) for solving linearly constrained convex composite optimization problems. Under a mild calmness condition, which holds automatically for convex composite piecewise linear-quadratic programming, we establish the global Q-linear rate of convergence for a general semi-proximal ADMM with the dual step-length being taken in (0, (1+51/2)/2). This semi-proximal ADMM, which covers the classic one, has the advantage to resolve the potentially nonsolvability issue of the subproblems in the classic ADMM and possesses the abilities of handling the multi-block cases efficiently. We demonstrate the usefulness of the obtained results when applied to two- and multi-block convex quadratic (semidefinite) programming.