Author

# Gang Wu

Bio: Gang Wu is an academic researcher from University of Science and Technology of China. The author has contributed to research in topics: Optimization problem & Control theory. The author has an hindex of 12, co-authored 50 publications receiving 2574 citations.

##### Papers

More filters

••

TL;DR: A novel decentralized exact first-order algorithm (abbreviated as EXTRA) to solve the consensus optimization problem and uses a fixed, large step size, which can be determined independently of the network size or topology.

Abstract: Recently, there has been growing interest in solving consensus optimization problems in a multiagent network. In this paper, we develop a decentralized algorithm for the consensus optimization problem $\mathrm{minimize}_{x\in\mathbb{R}^p}~\bar{f}(x)=\frac{1}{n}\sum_{i=1}^n f_i(x),$ which is defined over a connected network of $n$ agents, where each function $f_i$ is held privately by agent $i$ and encodes the agent's data and objective. All the agents shall collaboratively find the minimizer while each agent can only communicate with its neighbors. Such a computation scheme avoids a data fusion center or long-distance communication and offers better load balance to the network. This paper proposes a novel decentralized exact first-order algorithm (abbreviated as EXTRA) to solve the consensus optimization problem. “Exact” means that it can converge to the exact solution. EXTRA uses a fixed, large step size, which can be determined independently of the network size or topology. The local variable of every a...

906 citations

••

TL;DR: This paper establishes its linear convergence rate for the decentralized consensus optimization problem with strongly convex local objective functions in terms of the network topology, the properties ofLocal objective functions, and the algorithm parameter.

Abstract: In decentralized consensus optimization, a connected network of agents collaboratively minimize the sum of their local objective functions over a common decision variable, where their information exchange is restricted between the neighbors. To this end, one can first obtain a problem reformulation and then apply the alternating direction method of multipliers (ADMM). The method applies iterative computation at the individual agents and information exchange between the neighbors. This approach has been observed to converge quickly and deemed powerful. This paper establishes its linear convergence rate for the decentralized consensus optimization problem with strongly convex local objective functions. The theoretical convergence rate is explicitly given in terms of the network topology, the properties of local objective functions, and the algorithm parameter. This result is not only a performance guarantee but also a guideline toward accelerating the ADMM convergence.

836 citations

•

TL;DR: In this paper, a decentralized algorithm called EXTRA was proposed to solve the consensus optimization problem in a multi-agent network, where each function is held privately by an agent and the objective function is shared among all the agents.

Abstract: Recently, there have been growing interests in solving consensus optimization problems in a multi-agent network. In this paper, we develop a decentralized algorithm for the consensus optimization problem $$\min\limits_{x\in\mathbb{R}^p}~\bar{f}(x)=\frac{1}{n}\sum\limits_{i=1}^n f_i(x),$$ which is defined over a connected network of $n$ agents, where each function $f_i$ is held privately by agent $i$ and encodes the agent's data and objective. All the agents shall collaboratively find the minimizer while each agent can only communicate with its neighbors. Such a computation scheme avoids a data fusion center or long-distance communication and offers better load balance to the network.
This paper proposes a novel decentralized EXact firsT-ordeR Algorithm (abbreviated as EXTRA) to solve the consensus optimization problem. "exact" means that it can converge to the exact solution. EXTRA can use a fixed large step size, {which is independent of the network size}, and has synchronized iterations. The local variable of every agent $i$ converges uniformly and consensually to an exact minimizer of $\bar{f}$. In contrast, the well-known decentralized gradient descent (DGD) method must use diminishing step sizes in order to converge to an exact minimizer. EXTRA and DGD have the same choice of mixing matrices and similar per-iteration complexity. EXTRA, however, uses the gradients of last two iterates, unlike DGD which uses just that of last iterate.
EXTRA has the best known convergence rates among the existing first-order decentralized algorithms. Specifically, if $f_i$'s are convex and have Lipschitz continuous gradients, EXTRA has an ergodic convergence rate $O(\frac{1}{k})$ in terms of the first-order optimality residual. If $\bar{f}$ is also restricted strongly convex, EXTRA converges to an optimal solution at a linear rate $O(C^{-k})$ for some constant $C>1$.

735 citations

••

TL;DR: A proximal gradient exact first-order algorithm (PG-EXTRA) that utilizes the composite structure and has the best known convergence rate and is a nontrivial extension to the recent algorithm EXTRA.

Abstract: This paper proposes a decentralized algorithm for solving a consensus optimization problem defined in a static networked multi-agent system, where the local objective functions have the smooth+nonsmooth composite form. Examples of such problems include decentralized constrained quadratic programming and compressed sensing problems, as well as many regularization problems arising in inverse problems, signal processing, and machine learning, which have decentralized applications. This paper addresses the need for efficient decentralized algorithms that take advantages of proximal operations for the nonsmooth terms. We propose a proximal gradient exact first-order algorithm (PG-EXTRA) that utilizes the composite structure and has the best known convergence rate. It is a nontrivial extension to the recent algorithm EXTRA. At each iteration, each agent locally computes a gradient of the smooth part of its objective and a proximal map of the nonsmooth part, as well as exchanges information with its neighbors. The algorithm is “exact” in the sense that an exact consensus minimizer can be obtained with a fixed step size, whereas most previous methods must use diminishing step sizes. When the smooth part has Lipschitz gradients, PG-EXTRA has an ergodic convergence rate of $O\left({1\over k}\right)$ in terms of the first-order optimality residual. When the smooth part vanishes, PG-EXTRA reduces to P-EXTRA, an algorithm without the gradients (so no “G” in the name), which has a slightly improved convergence rate at $o\left({1\over k}\right)$ in a standard (non-ergodic) sense. Numerical experiments demonstrate effectiveness of PG-EXTRA and validate our convergence results

284 citations

••

TL;DR: DLM combines the rapid convergence of ADMM with the low computational burden of DGM, and is proven to converge to the optimal solution when the local cost functions have Lipschitz continuous gradients.

Abstract: This paper develops the Decentralized Linearized Alternating Direction Method of Multipliers (DLM) that minimizes a sum of local cost functions in a multiagent network. The algorithm mimics operation of the decentralized alternating direction method of multipliers (DADMM) except that it linearizes the optimization objective at each iteration. This results in iterations that, instead of successive minimizations, implement steps whose cost is akin to the much lower cost of the gradient descent step used in the distributed gradient method (DGM). The algorithm is proven to converge to the optimal solution when the local cost functions have Lipschitz continuous gradients. Its rate of convergence is shown to be linear if the local cost functions are further assumed to be strongly convex. Numerical experiments in least squares and logistic regression problems show that the number of iterations to achieve equivalent optimality gaps are similar for DLM and ADMM and both much smaller than those of DGM. In that sense, DLM combines the rapid convergence of ADMM with the low computational burden of DGM.

204 citations

##### Cited by

More filters

••

TL;DR: This paper establishes its linear convergence rate for the decentralized consensus optimization problem with strongly convex local objective functions in terms of the network topology, the properties ofLocal objective functions, and the algorithm parameter.

Abstract: In decentralized consensus optimization, a connected network of agents collaboratively minimize the sum of their local objective functions over a common decision variable, where their information exchange is restricted between the neighbors. To this end, one can first obtain a problem reformulation and then apply the alternating direction method of multipliers (ADMM). The method applies iterative computation at the individual agents and information exchange between the neighbors. This approach has been observed to converge quickly and deemed powerful. This paper establishes its linear convergence rate for the decentralized consensus optimization problem with strongly convex local objective functions. The theoretical convergence rate is explicitly given in terms of the network topology, the properties of local objective functions, and the algorithm parameter. This result is not only a performance guarantee but also a guideline toward accelerating the ADMM convergence.

836 citations

•

TL;DR: In this paper, a decentralized algorithm called EXTRA was proposed to solve the consensus optimization problem in a multi-agent network, where each function is held privately by an agent and the objective function is shared among all the agents.

Abstract: Recently, there have been growing interests in solving consensus optimization problems in a multi-agent network. In this paper, we develop a decentralized algorithm for the consensus optimization problem $$\min\limits_{x\in\mathbb{R}^p}~\bar{f}(x)=\frac{1}{n}\sum\limits_{i=1}^n f_i(x),$$ which is defined over a connected network of $n$ agents, where each function $f_i$ is held privately by agent $i$ and encodes the agent's data and objective. All the agents shall collaboratively find the minimizer while each agent can only communicate with its neighbors. Such a computation scheme avoids a data fusion center or long-distance communication and offers better load balance to the network.
This paper proposes a novel decentralized EXact firsT-ordeR Algorithm (abbreviated as EXTRA) to solve the consensus optimization problem. "exact" means that it can converge to the exact solution. EXTRA can use a fixed large step size, {which is independent of the network size}, and has synchronized iterations. The local variable of every agent $i$ converges uniformly and consensually to an exact minimizer of $\bar{f}$. In contrast, the well-known decentralized gradient descent (DGD) method must use diminishing step sizes in order to converge to an exact minimizer. EXTRA and DGD have the same choice of mixing matrices and similar per-iteration complexity. EXTRA, however, uses the gradients of last two iterates, unlike DGD which uses just that of last iterate.
EXTRA has the best known convergence rates among the existing first-order decentralized algorithms. Specifically, if $f_i$'s are convex and have Lipschitz continuous gradients, EXTRA has an ergodic convergence rate $O(\frac{1}{k})$ in terms of the first-order optimality residual. If $\bar{f}$ is also restricted strongly convex, EXTRA converges to an optimal solution at a linear rate $O(C^{-k})$ for some constant $C>1$.

735 citations

•

TL;DR: This paper studies a D-PSGD algorithm and provides the first theoretical analysis that indicates a regime in which decentralized algorithms might outperform centralized algorithms for distributed stochastic gradient descent.

Abstract: Most distributed machine learning systems nowadays, including TensorFlow and CNTK, are built in a centralized fashion. One bottleneck of centralized algorithms lies on high communication cost on the central node. Motivated by this, we ask, can decentralized algorithms be faster than its centralized counterpart?
Although decentralized PSGD (D-PSGD) algorithms have been studied by the control community, existing analysis and theory do not show any advantage over centralized PSGD (C-PSGD) algorithms, simply assuming the application scenario where only the decentralized network is available. In this paper, we study a D-PSGD algorithm and provide the first theoretical analysis that indicates a regime in which decentralized algorithms might outperform centralized algorithms for distributed stochastic gradient descent. This is because D-PSGD has comparable total computational complexities to C-PSGD but requires much less communication cost on the busiest node. We further conduct an empirical study to validate our theoretical analysis across multiple frameworks (CNTK and Torch), different network configurations, and computation platforms up to 112 GPUs. On network configurations with low bandwidth or high latency, D-PSGD can be up to one order of magnitude faster than its well-optimized centralized counterparts.

582 citations

••

TL;DR: This paper finds the optimal algorithm parameters that minimize the convergence factor of the ADMM iterates in the context of ℓ2-regularized minimization and constrained quadratic programming.

Abstract: The alternating direction method of multipliers (ADMM) has emerged as a powerful technique for large-scale structured optimization. Despite many recent results on the convergence properties of ADMM, a quantitative characterization of the impact of the algorithm parameters on the convergence times of the method is still lacking. In this paper we find the optimal algorithm parameters that minimize the convergence factor of the ADMM iterates in the context of l2-regularized minimization and constrained quadratic programming. Numerical examples show that our parameter selection rules significantly outperform existing alternatives in the literature.

484 citations

••

University of North Texas

^{1}, Royal Institute of Technology^{2}, Zhejiang University^{3}, Huazhong University of Science and Technology^{4}, Pacific Northwest National Laboratory^{5}, Tsinghua University^{6}, Chinese Academy of Sciences^{7}, Oak Ridge National Laboratory^{8}, University of Virginia^{9}TL;DR: This survey paper aims to offer a detailed overview of existing distributed optimization algorithms and their applications in power systems, and focuses on the application of distributed optimization in the optimal coordination of distributed energy resources.

468 citations