scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Parallel Multi-Block ADMM with o(1 / k) Convergence

TL;DR: The classic ADMM can be extended to the N-block Jacobi fashion and preserve convergence in the following two cases: (i) matrices A_i and Ai are mutually near-orthogonal and have full column-rank, or (ii) proximal terms are added to theN subproblems (but without any assumption on matrices $$A_i$$Ai).
Abstract: This paper introduces a parallel and distributed algorithm for solving the following minimization problem with linear constraints: $$\begin{aligned} \text {minimize} ~~&f_1(\mathbf{x}_1) + \cdots + f_N(\mathbf{x}_N)\\ \text {subject to}~~&A_1 \mathbf{x}_1 ~+ \cdots + A_N\mathbf{x}_N =c,\\&\mathbf{x}_1\in {\mathcal {X}}_1,~\ldots , ~\mathbf{x}_N\in {\mathcal {X}}_N, \end{aligned}$$minimizef1(x1)+ź+fN(xN)subject toA1x1+ź+ANxN=c,x1źX1,ź,xNźXN,where $$N \ge 2$$Nź2, $$f_i$$fi are convex functions, $$A_i$$Ai are matrices, and $${\mathcal {X}}_i$$Xi are feasible sets for variable $$\mathbf{x}_i$$xi. Our algorithm extends the alternating direction method of multipliers (ADMM) and decomposes the original problem into N smaller subproblems and solves them in parallel at each iteration. This paper shows that the classic ADMM can be extended to the N-block Jacobi fashion and preserve convergence in the following two cases: (i) matrices $$A_i$$Ai are mutually near-orthogonal and have full column-rank, or (ii) proximal terms are added to the N subproblems (but without any assumption on matrices $$A_i$$Ai). In the latter case, certain proximal terms can let the subproblem be solved in more flexible and efficient ways. We show that $$\Vert {\mathbf {x}}^{k+1} - {\mathbf {x}}^k\Vert _M^2$$źxk+1-xkźM2 converges at a rate of o(1 / k) where M is a symmetric positive semi-definte matrix. Since the parameters used in the convergence analysis are conservative, we introduce a strategy for automatically tuning the parameters to substantially accelerate our algorithm in practice. We implemented our algorithm (for the case ii above) on Amazon EC2 and tested it on basis pursuit problems with >300 GB of distributed data. This is the first time that successfully solving a compressive sensing problem of such a large scale is reported.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: In this paper, the convergence of the alternating direction method of multipliers (ADMM) for minimizing a nonconvex and possibly nonsmooth objective function is analyzed, subject to coupled linear equality constraints.
Abstract: In this paper, we analyze the convergence of the alternating direction method of multipliers (ADMM) for minimizing a nonconvex and possibly nonsmooth objective function, $$\phi (x_0,\ldots ,x_p,y)$$ , subject to coupled linear equality constraints. Our ADMM updates each of the primal variables $$x_0,\ldots ,x_p,y$$ , followed by updating the dual variable. We separate the variable y from $$x_i$$ ’s as it has a special role in our analysis. The developed convergence guarantee covers a variety of nonconvex functions such as piecewise linear functions, $$\ell _q$$ quasi-norm, Schatten-q quasi-norm ( $$0

867 citations

Journal ArticleDOI
TL;DR: It is shown that in the presence of nonconvex objective function, classical ADMM is able to reach the set of stationary solutions for these problems, if the stepsize is chosen large enough.
Abstract: The alternating direction method of multipliers (ADMM) is widely used to solve large-scale linearly constrained optimization problems, convex or nonconvex, in many engineering fields. However there is a general lack of theoretical understanding of the algorithm when the objective function is nonconvex. In this paper we analyze the convergence of the ADMM for solving certain nonconvex consensus and sharing problems. We show that the classical ADMM converges to the set of stationary solutions, provided that the penalty parameter in the augmented Lagrangian is chosen to be sufficiently large. For the sharing problems, we show that the ADMM is convergent regardless of the number of variable blocks. Our analysis does not impose any assumptions on the iterates generated by the algorithm and is broadly applicable to many ADMM variants involving proximal update rules and various flexible block selection rules.

711 citations


Cites background from "Parallel Multi-Block ADMM with o(1 ..."

  • ...Therefore, most research effort in this direction has been focused on either analyzing problems with additional conditions or showing convergence for variants of the ADMM; see, for example, [26, 27, 28, 29, 30, 31, 32, 33, 34]....

    [...]

Journal ArticleDOI
TL;DR: This note proposes a novel approach to derive a worst-case convergence rate measured by the iteration complexity in a non-ergodic sense for the Douglas–Rachford alternating direction method of multipliers proposed by Glowinski and Marrocco.
Abstract: This note proposes a novel approach to derive a worst-case $$O(1/k)$$O(1/k) convergence rate measured by the iteration complexity in a non-ergodic sense for the Douglas---Rachford alternating direction method of multipliers proposed by Glowinski and Marrocco.

314 citations


Additional excerpts

  • ...2 in [2], we can immediately refine the worst-case convergence rate in Theorem 6....

    [...]

Posted Content
TL;DR: ADMM might be a better choice than ALM for some nonconvex nonsmooth problems, because ADMM is not only easier to implement, it is also more likely to converge for the concerned scenarios.
Abstract: In this paper, we analyze the convergence of the alternating direction method of multipliers (ADMM) for minimizing a nonconvex and possibly nonsmooth objective function, $\phi(x_0,\ldots,x_p,y)$, subject to coupled linear equality constraints. Our ADMM updates each of the primal variables $x_0,\ldots,x_p,y$, followed by updating the dual variable. We separate the variable $y$ from $x_i$'s as it has a special role in our analysis. The developed convergence guarantee covers a variety of nonconvex functions such as piecewise linear functions, $\ell_q$ quasi-norm, Schatten-$q$ quasi-norm ($0

300 citations

Book ChapterDOI
TL;DR: This chapter tackles the discrepancy between theory and practice and uncover fundamental limits of a class of operator-splitting schemes, and shows that the relaxed Peaceman-Rachford splitting algorithm is nearly as fast as the proximal point algorithm in the ergodic sense and nearly as slow as the subgradient method in the nonergodic sense.
Abstract: Operator-splitting schemes are iterative algorithms for solving many types of numerical problems. A lot is known about these methods: they converge, and in many cases we know how quickly they converge. But when they are applied to optimization problems, there is a gap in our understanding: The theoretical speed of operator-splitting schemes is nearly always measured in the ergodic sense, but ergodic operator-splitting schemes are rarely used in practice. In this chapter, we tackle the discrepancy between theory and practice and uncover fundamental limits of a class of operator-splitting schemes. Our surprising conclusion is that the relaxed Peaceman-Rachford splitting algorithm, a version of the Alternating Direction Method of Multipliers (ADMM), is nearly as fast as the proximal point algorithm in the ergodic sense and nearly as slow as the subgradient method in the nonergodic sense. A large class of operator-splitting schemes extend from the relaxed Peaceman-Rachford splitting algorithm. Our results show that this class of operator-splitting schemes is also nearly as slow as the subgradient method. The tools we create in this chapter can also be used to prove nonergodic convergence rates of more general splitting schemes, so they are interesting in their own right.

264 citations

References
More filters
Book
23 May 2011
TL;DR: It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.
Abstract: Many problems of recent interest in statistics and machine learning can be posed in the framework of convex optimization. Due to the explosion in size and complexity of modern datasets, it is increasingly important to be able to solve problems with a very large number of features or training examples. As a result, both the decentralized collection or storage of these datasets as well as accompanying distributed solution methods are either necessary or at least highly desirable. In this review, we argue that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas. The method was developed in the 1970s, with roots in the 1950s, and is equivalent or closely related to many other algorithms, such as dual decomposition, the method of multipliers, Douglas–Rachford splitting, Spingarn's method of partial inverses, Dykstra's alternating projections, Bregman iterative algorithms for l1 problems, proximal methods, and others. After briefly surveying the theory and history of the algorithm, we discuss applications to a wide variety of statistical and machine learning problems of recent interest, including the lasso, sparse logistic regression, basis pursuit, covariance selection, support vector machines, and many others. We also discuss general distributed optimization, extensions to the nonconvex setting, and efficient implementation, including some details on distributed MPI and Hadoop MapReduce implementations.

17,433 citations


"Parallel Multi-Block ADMM with o(1 ..." refers background or methods in this paper

  • ...Note that x1 is not part of w because x1 can be regarded as an intermediate variable in the iterations of ADMM, whereas (x2, λ) are the essential variables [3]....

    [...]

  • ...See [11, 1, 3, 29, 4, 24, 22, 31, 23] and the references therein for a number of examples....

    [...]

  • ...1) with N ≥ 3 using ADMM, one can first convert the multiblock problem into an equivalent two-block problem via variable splitting [1, 3, 30]:...

    [...]

01 Feb 1977

5,933 citations


"Parallel Multi-Block ADMM with o(1 ..." refers background in this paper

  • ...4 of [33], taking limit over the subsequence {k j } on both sides of (4....

    [...]

Book
01 Jan 1989
TL;DR: This work discusses parallel and distributed architectures, complexity measures, and communication and synchronization issues, and it presents both Jacobi and Gauss-Seidel iterations, which serve as algorithms of reference for many of the computational approaches addressed later.
Abstract: gineering, computer science, operations research, and applied mathematics. It is essentially a self-contained work, with the development of the material occurring in the main body of the text and excellent appendices on linear algebra and analysis, graph theory, duality theory, and probability theory and Markov chains supporting it. The introduction discusses parallel and distributed architectures, complexity measures, and communication and synchronization issues, and it presents both Jacobi and Gauss-Seidel iterations, which serve as algorithms of reference for many of the computational approaches addressed later. After the introduction, the text is organized in two parts: synchronous algorithms and asynchronous algorithms. The discussion of synchronous algorithms comprises four chapters, with Chapter 2 presenting both direct methods (converging to the exact solution within a finite number of steps) and iterative methods for linear

5,597 citations


"Parallel Multi-Block ADMM with o(1 ..." refers background or methods in this paper

  • ...See [11, 1, 3, 29, 4, 24, 22, 31, 23] and the references therein for a number of examples....

    [...]

  • ...1) with N ≥ 3 using ADMM, one can first convert the multiblock problem into an equivalent two-block problem via variable splitting [1, 3, 30]:...

    [...]

Journal ArticleDOI
TL;DR: A new approach for constructing efficient schemes for non-smooth convex optimization is proposed, based on a special smoothing technique, which can be applied to functions with explicit max-structure, and can be considered as an alternative to black-box minimization.
Abstract: In this paper we propose a new approach for constructing efficient schemes for non-smooth convex optimization. It is based on a special smoothing technique, which can be applied to functions with explicit max-structure. Our approach can be considered as an alternative to black-box minimization. From the viewpoint of efficiency estimates, we manage to improve the traditional bounds on the number of iterations of the gradient schemes from ** keeping basically the complexity of each iteration unchanged.

2,948 citations


"Parallel Multi-Block ADMM with o(1 ..." refers methods in this paper

  • ...The dual smoothing method [30] can be applied under certain conditions and improve the rate to O(1/k)....

    [...]

Journal ArticleDOI
TL;DR: A dual method is proposed which decouples the difficulties relative to the functionals f and g from the possible ill-conditioning effects of the linear operator A and leads to an efficient and simply implementable algorithm.
Abstract: For variational problems of the form Inf v∈ V {f(Av)+g(v)} , we propose a dual method which decouples the difficulties relative to the functionals f and g from the possible ill-conditioning effects of the linear operator A. The approach is based on the use of an Augmented Lagrangian functional and leads to an efficient and simply implementable algorithm. We study also the finite element approximation of such problems, compatible with the use of our algorithm. The method is finally applied to solve several problems of continuum mechanics.

2,500 citations


"Parallel Multi-Block ADMM with o(1 ..." refers background in this paper

  • ...The convergence of the standard two-block ADMM has been long established in the literature [10, 12]....

    [...]

  • ...8) has been long established and its proof dates back to [10, 12]....

    [...]

  • ...ADMM was introduced in [10, 12] to solve the special case of problem (1....

    [...]