Showing papers in "arXiv: Optimization and Control in 2017"

PDF

Open Access

Posted Content•

Distributed Observers Design for Leader-Following Control of Multi-Agent Networks (Extended Version)

[...]

Yiguang Hong, Guanrong Chen, Linda Bushnell

31 Dec 2017-arXiv: Optimization and Control

TL;DR: It is proved that each agent can follow the active leader of a multi-agent system with a switching interconnection topology with an explicitly constructed common Lyapunov function (CLF).

...read moreread less

Abstract: This paper is concerned with a leader-follower problem for a multi-agent system with a switching interconnection topology. Distributed observers are designed for the second-order follower-agents, under the common assumption that the velocity of the active leader cannot be measured in real time. Some dynamic neighbor-based rules, consisting of distributed controllers and observers for the autonomous agents, are developed to keep updating the information of the leader. With the help of an explicitly constructed common Lyapunov function (CLF), it is proved that each agent can follow the active leader. Moreover, the tracking error is estimated even in a noisy environment. Finally, a numerical example is given for illustration.

...read moreread less

957 citations

Posted Content•

Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent

[...]

Xiangru Lian¹, Ce Zhang², Huan Zhang³, Cho-Jui Hsieh³, Wei Zhang⁴, Ji Liu¹ - Show less +2 more•Institutions (4)

University of Rochester¹, ETH Zurich², University of California, Davis³, IBM⁴

25 May 2017-arXiv: Optimization and Control

TL;DR: This paper studies a D-PSGD algorithm and provides the first theoretical analysis that indicates a regime in which decentralized algorithms might outperform centralized algorithms for distributed stochastic gradient descent.

...read moreread less

Abstract: Most distributed machine learning systems nowadays, including TensorFlow and CNTK, are built in a centralized fashion. One bottleneck of centralized algorithms lies on high communication cost on the central node. Motivated by this, we ask, can decentralized algorithms be faster than its centralized counterpart? Although decentralized PSGD (D-PSGD) algorithms have been studied by the control community, existing analysis and theory do not show any advantage over centralized PSGD (C-PSGD) algorithms, simply assuming the application scenario where only the decentralized network is available. In this paper, we study a D-PSGD algorithm and provide the first theoretical analysis that indicates a regime in which decentralized algorithms might outperform centralized algorithms for distributed stochastic gradient descent. This is because D-PSGD has comparable total computational complexities to C-PSGD but requires much less communication cost on the busiest node. We further conduct an empirical study to validate our theoretical analysis across multiple frameworks (CNTK and Torch), different network configurations, and computation platforms up to 112 GPUs. On network configurations with low bandwidth or high latency, D-PSGD can be up to one order of magnitude faster than its well-optimized centralized counterparts.

...read moreread less

582 citations

Journal Article•DOI•

OSQP: An Operator Splitting Solver for Quadratic Programs

[...]

Bartolomeo Stellato¹, Goran Banjac², Paul J. Goulart³, Alberto Bemporad⁴, Stephen Boyd⁵ - Show less +1 more•Institutions (5)

Massachusetts Institute of Technology¹, ETH Zurich², University of Oxford³, IMT Institute for Advanced Studies Lucca⁴, Stanford University⁵

21 Nov 2017-arXiv: Optimization and Control

TL;DR: This work presents a general-purpose solver for convex quadratic programs based on the alternating direction method of multipliers, employing a novel operator splitting technique that requires the solution of a quasi-definite linear system with the same coefficient matrix at almost every iteration.

...read moreread less

Abstract: We present a general purpose solver for quadratic programs based on the alternating direction method of multipliers, employing a novel operator splitting technique that requires the solution of a quasi-definite linear system with the same coefficient matrix in each iteration. Our algorithm is very robust, placing no requirements on the problem data such as positive definiteness of the objective function or linear independence of the constraint functions. It is division-free once an initial matrix factorization is carried out, making it suitable for real-time applications in embedded systems. In addition, our technique is the first operator splitting method for quadratic programs able to reliably detect primal and dual infeasible problems from the algorithm iterates. The method also supports factorization caching and warm starting, making it particularly efficient when solving parametrized problems arising in finance, control, and machine learning. Our open-source C implementation OSQP has a small footprint, is library-free, and has been extensively tested on many problem instances from a wide variety of application areas. It is typically ten times faster than competing interior point methods, and sometimes much more when factorization caching or warm start is used.

...read moreread less

442 citations

Journal Article•DOI•

Learned Primal-dual Reconstruction

[...]

Jonas Adler¹, Ozan Öktem¹•Institutions (1)

Royal Institute of Technology¹

20 Jul 2017-arXiv: Optimization and Control

TL;DR: In this article, the learned primal-dual (LPD) algorithm is proposed for tomographic reconstruction, where the proximal operators have been replaced with convolutional neural networks and the algorithm is trained end-to-end, working directly from raw measured data.

...read moreread less

Abstract: We propose the Learned Primal-Dual algorithm for tomographic reconstruction. The algorithm accounts for a (possibly non-linear) forward operator in a deep neural network by unrolling a proximal primal-dual optimization method, but where the proximal operators have been replaced with convolutional neural networks. The algorithm is trained end-to-end, working directly from raw measured data and it does not depend on any initial reconstruction such as FBP. We compare performance of the proposed method on low dose CT reconstruction against FBP, TV, and deep learning based post-processing of FBP. For the Shepp-Logan phantom we obtain >6dB PSNR improvement against all compared methods. For human phantoms the corresponding improvement is 6.6dB over TV and 2.2dB over learned post-processing along with a substantial improvement in the SSIM. Finally, our algorithm involves only ten forward-back-projection computations, making the method feasible for time critical clinical applications.

...read moreread less

403 citations

Posted Content•

A Rewriting System for Convex Optimization Problems

[...]

Akshay Agrawal¹, Robin Verschueren², Steven Diamond¹, Stephen Boyd¹•Institutions (2)

Stanford University¹, University of Freiburg²

13 Sep 2017-arXiv: Optimization and Control

TL;DR: In this article, a modular rewriting system for translating optimization problems written in a domain-specific language to forms compatible with low-level solver interfaces is described, facilitated by reductions which accept a category of problems and transform instances of that category to equivalent instances of another category.

...read moreread less

Abstract: We describe a modular rewriting system for translating optimization problems written in a domain-specific language to forms compatible with low-level solver interfaces. Translation is facilitated by reductions, which accept a category of problems and transform instances of that category to equivalent instances of another category. Our system proceeds in two key phases: analysis, in which we attempt to find a suitable solver for a supplied problem, and canonicalization, in which we rewrite the problem in the selected solver's standard form. We implement the described system in version 1.0 of CVXPY, a domain-specific language for mathematical and especially convex optimization. By treating reductions as first-class objects, our method makes it easy to match problems to solvers well-suited for them and to support solvers with a wide variety of standard forms.

...read moreread less

323 citations

Posted Content•

A Review on Bilevel Optimization: From Classical to Evolutionary Approaches and Applications.

[...]

Ankur Sinha¹, Pekka Malo², Kalyanmoy Deb³•Institutions (3)

Indian Institute of Management Ahmedabad¹, Aalto University², Michigan State University³

17 May 2017-arXiv: Optimization and Control

TL;DR: An automated text-analysis of an extended list of papers published on bilevel optimization from the basic principles to solution strategies; both classical and evolutionary is performed.

...read moreread less

Abstract: Bilevel optimization is defined as a mathematical program, where an optimization problem contains another optimization problem as a constraint. These problems have received significant attention from the mathematical programming community. Only limited work exists on bilevel problems using evolutionary computation techniques; however, recently there has been an increasing interest due to the proliferation of practical applications and the potential of evolutionary algorithms in tackling these problems. This paper provides a comprehensive review on bilevel optimization from the basic principles to solution strategies; both classical and evolutionary. A number of potential application problems are also discussed. To offer the readers insights on the prominent developments in the field of bilevel optimization, we have performed an automated text-analysis of an extended list of papers published on bilevel optimization to date. This paper should motivate evolutionary computation researchers to pay more attention to this practical yet challenging area.

...read moreread less

268 citations

Posted Content•

Optimal algorithms for smooth and strongly convex distributed optimization in networks

[...]

Kevin Scaman¹, Francis Bach, Sébastien Bubeck, Yin Tat Lee, Laurent Massoulié¹ - Show less +1 more•Institutions (1)

French Institute for Research in Computer Science and Automation¹

28 Feb 2017-arXiv: Optimization and Control

TL;DR: In this article, the authors determine the optimal convergence rates for strongly convex and smooth distributed optimization in two settings: centralized and decentralized communications over a network, and show that distributing Nesterov's accelerated gradient descent is optimal and achieves a precision ≥ 0.

...read moreread less

Abstract: In this paper, we determine the optimal convergence rates for strongly convex and smooth distributed optimization in two settings: centralized and decentralized communications over a network. For centralized (i.e. master/slave) algorithms, we show that distributing Nesterov's accelerated gradient descent is optimal and achieves a precision $\varepsilon > 0$ in time $O(\sqrt{\kappa_g}(1+\Delta\tau)\ln(1/\varepsilon))$, where $\kappa_g$ is the condition number of the (global) function to optimize, $\Delta$ is the diameter of the network, and $\tau$ (resp. $1$) is the time needed to communicate values between two neighbors (resp. perform local computations). For decentralized algorithms based on gossip, we provide the first optimal algorithm, called the multi-step dual accelerated (MSDA) method, that achieves a precision $\varepsilon > 0$ in time $O(\sqrt{\kappa_l}(1+\frac{\tau}{\sqrt{\gamma}})\ln(1/\varepsilon))$, where $\kappa_l$ is the condition number of the local functions and $\gamma$ is the (normalized) eigengap of the gossip matrix used for communication between nodes. We then verify the efficiency of MSDA against state-of-the-art methods for two problems: least-squares regression and classification by logistic regression.

...read moreread less

210 citations

Posted Content•

On the Properties of the Softmax Function with Application in Game Theory and Reinforcement Learning

[...]

Bolin Gao, Lacra Pavel

03 Apr 2017-arXiv: Optimization and Control

TL;DR: This paper shows that the softmax function is the monotone gradient map of the log-sum-exp function and exploits the inverse temperature parameter to derive the Lipschitz and co-coercivity properties of thesoftmax function.

...read moreread less

Abstract: In this paper, we utilize results from convex analysis and monotone operator theory to derive additional properties of the softmax function that have not yet been covered in the existing literature. In particular, we show that the softmax function is the monotone gradient map of the log-sum-exp function. By exploiting this connection, we show that the inverse temperature parameter determines the Lipschitz and co-coercivity properties of the softmax function. We then demonstrate the usefulness of these properties through an application in game-theoretic reinforcement learning.

...read moreread less

183 citations

Posted Content•

Smart "Predict, then Optimize"

[...]

Adam N. Elmachtoub¹, Paul Grigas²•Institutions (2)

Columbia University¹, University of California, Berkeley²

22 Oct 2017-arXiv: Optimization and Control

TL;DR: Numerical experiments show that the SPO framework can lead to significant improvement under the predict-then-optimize paradigm, in particular, when the prediction model being trained is misspecified.

...read moreread less

Abstract: Many real-world analytics problems involve two significant challenges: prediction and optimization. Due to the typically complex nature of each challenge, the standard paradigm is predict-then-optimize. By and large, machine learning tools are intended to minimize prediction error and do not account for how the predictions will be used in the downstream optimization problem. In contrast, we propose a new and very general framework, called Smart "Predict, then Optimize" (SPO), which directly leverages the optimization problem structure, i.e., its objective and constraints, for designing better prediction models. A key component of our framework is the SPO loss function which measures the decision error induced by a prediction. Training a prediction model with respect to the SPO loss is computationally challenging, and thus we derive, using duality theory, a convex surrogate loss function which we call the SPO+ loss. Most importantly, we prove that the SPO+ loss is statistically consistent with respect to the SPO loss under mild conditions. Our SPO+ loss function can tractably handle any polyhedral, convex, or even mixed-integer optimization problem with a linear objective. Numerical experiments on shortest path and portfolio optimization problems show that the SPO framework can lead to significant improvement under the predict-then-optimize paradigm, in particular when the prediction model being trained is misspecified. We find that linear models trained using SPO+ loss tend to dominate random forest algorithms, even when the ground truth is highly nonlinear.

...read moreread less

178 citations

Posted Content•

Non-convex Finite-Sum Optimization Via SCSG Methods

[...]

Lihua Lei¹, Cheng Ju¹, Jianbo Chen¹, Michael I. Jordan¹•Institutions (1)

University of California, Berkeley¹

28 Jun 2017-arXiv: Optimization and Control

TL;DR: A class of algorithms, as variants of the stochastically controlled stochastic gradient methods (SCSG) methods, for the smooth non-convex finite-sum optimization problem, which demonstrates that SCSG outperforms stochastics gradient methods on training multi-layers neural networks in terms of both training and validation loss.

...read moreread less

Abstract: We develop a class of algorithms, as variants of the stochastically controlled stochastic gradient (SCSG) methods (Lei and Jordan, 2016), for the smooth non-convex finite-sum optimization problem. Assuming the smoothness of each component, the complexity of SCSG to reach a stationary point with $\mathbb{E} \| abla f(x)\|^{2}\le \epsilon$ is $O\left (\min\{\epsilon^{-5/3}, \epsilon^{-1}n^{2/3}\}\right)$, which strictly outperforms the stochastic gradient descent. Moreover, SCSG is never worse than the state-of-the-art methods based on variance reduction and it significantly outperforms them when the target accuracy is low. A similar acceleration is also achieved when the functions satisfy the Polyak-Lojasiewicz condition. Empirical experiments demonstrate that SCSG outperforms stochastic gradient methods on training multi-layers neural networks in terms of both training and validation loss.

...read moreread less

171 citations

Posted Content•

Network Topology and Communication-Computation Tradeoffs in Decentralized Optimization

[...]

Angelia Nedic¹, Alex Olshevsky², Michael G. Rabbat³•Institutions (3)

Arizona State University¹, Boston University², Facebook³

26 Sep 2017-arXiv: Optimization and Control

TL;DR: In decentralized optimization, nodes cooperate to minimize an overall objective function that is the sum (or average) of per-node private objective functions as discussed by the authors, where nodes interleave local computations with communication among all or a subset of the nodes.

...read moreread less

Abstract: In decentralized optimization, nodes cooperate to minimize an overall objective function that is the sum (or average) of per-node private objective functions. Algorithms interleave local computations with communication among all or a subset of the nodes. Motivated by a variety of applications---distributed estimation in sensor networks, fitting models to massive data sets, and distributed control of multi-robot systems, to name a few---significant advances have been made towards the development of robust, practical algorithms with theoretical performance guarantees. This paper presents an overview of recent work in this area. In general, rates of convergence depend not only on the number of nodes involved and the desired level of accuracy, but also on the structure and nature of the network over which nodes communicate (e.g., whether links are directed or undirected, static or time-varying). We survey the state-of-the-art algorithms and their analyses tailored to these different scenarios, highlighting the role of the network topology.

...read moreread less

Posted Content•

General Heuristics for Nonconvex Quadratically Constrained Quadratic Programming

[...]

Jaehyun Park, Stephen Boyd

22 Mar 2017-arXiv: Optimization and Control

TL;DR: The Suggest-and-Improve framework for general nonconvex quadratically constrained quadratic programs (QCQPs) is introduced and an open-source Python package QCQP is introduced, which implements the heuristics discussed in the paper.

...read moreread less

Abstract: We introduce the Suggest-and-Improve framework for general nonconvex quadratically constrained quadratic programs (QCQPs). Using this framework, we generalize a number of known methods and provide heuristics to get approximate solutions to QCQPs for which no specialized methods are available. We also introduce an open-source Python package QCQP, which implements the heuristics discussed in the paper.

...read moreread less

Journal Article•DOI•

Solving ill-posed inverse problems using iterative deep neural networks

[...]

Jonas Adler¹, Ozan Öktem¹•Institutions (1)

Royal Institute of Technology¹

13 Apr 2017-arXiv: Optimization and Control

TL;DR: The method builds on ideas from classical regularization theory and recent advances in deep learning to perform learning while making use of prior information about the inverse problem encoded in the forward operator, noise model and a regularizing functional to results in a gradient-like iterative scheme.

...read moreread less

Abstract: We propose a partially learned approach for the solution of ill posed inverse problems with not necessarily linear forward operators. The method builds on ideas from classical regularization theory and recent advances in deep learning to perform learning while making use of prior information about the inverse problem encoded in the forward operator, noise model and a regularizing functional. The method results in a gradient-like iterative scheme, where the "gradient" component is learned using a convolutional network that includes the gradients of the data discrepancy and regularizer as input in each iteration. We present results of such a partially learned gradient scheme on a non-linear tomographic inversion problem with simulated data from both the Sheep-Logan phantom as well as a head CT. The outcome is compared against FBP and TV reconstruction and the proposed method provides a 5.4 dB PSNR improvement over the TV reconstruction while being significantly faster, giving reconstructions of 512 x 512 volumes in about 0.4 seconds using a single GPU.

...read moreread less

Journal Article•DOI•

On Convergence of Extended Dynamic Mode Decomposition to the Koopman Operator

[...]

Milan Korda¹, Igor Mezic¹•Institutions (1)

University of California, Santa Barbara¹

14 Mar 2017-arXiv: Optimization and Control

TL;DR: An analytic version of the EDMD algorithm is proposed which, under some assumptions, allows one to construct the Koopman operator directly, without the use of sampling, and convergence of the predictions of future values of a given observable over any finite time horizon is implied.

...read moreread less

Abstract: Extended Dynamic Mode Decomposition (EDMD) is an algorithm that approximates the action of the Koopman operator on an $N$-dimensional subspace of the space of observables by sampling at $M$ points in the state space. Assuming that the samples are drawn either independently or ergodically from some measure $\mu$, it was shown that, in the limit as $M\rightarrow\infty$, the EDMD operator $\mathcal{K}_{N,M}$ converges to $\mathcal{K}_N$, where $\mathcal{K}_N$ is the $L_2(\mu)$-orthogonal projection of the action of the Koopman operator on the finite-dimensional subspace of observables. In this work, we show that, as $N \rightarrow \infty$, the operator $\mathcal{K}_N$ converges in the strong operator topology to the Koopman operator. This in particular implies convergence of the predictions of future values of a given observable over any finite time horizon, a fact important for practical applications such as forecasting, estimation and control. In addition, we show that accumulation points of the spectra of $\mathcal{K}_N$ correspond to the eigenvalues of the Koopman operator with the associated eigenfunctions converging weakly to an eigenfunction of the Koopman operator, provided that the weak limit of eigenfunctions is nonzero. As a by-product, we propose an analytic version of the EDMD algorithm which, under some assumptions, allows one to construct $\mathcal{K}_N$ directly, without the use of sampling. Finally, under additional assumptions, we analyze convergence of $\mathcal{K}_{N,N}$ (i.e., $M=N$), proving convergence, along a subsequence, to weak eigenfunctions (or eigendistributions) related to the eigenmeasures of the Perron-Frobenius operator. No assumptions on the observables belonging to a finite-dimensional invariant subspace of the Koopman operator are required throughout.

...read moreread less

Posted Content•

Data-driven discovery of Koopman eigenfunctions for control

[...]

Eurika Kaiser¹, J. Nathan Kutz, Steven L. Brunton•Institutions (1)

University of Washington¹

04 Jul 2017-arXiv: Optimization and Control

TL;DR: The Koopman Reduced Order Nonlinear Identification and Control (KRONIC) as mentioned in this paper is a data-driven control architecture based on the partial differential equation governing the infinitesimal generator of the koopman operator.

...read moreread less

Abstract: Data-driven transformations that reformulate nonlinear systems in a linear framework have the potential to enable the prediction, estimation, and control of strongly nonlinear dynamics using linear systems theory. The Koopman operator has emerged as a principled linear embedding of nonlinear dynamics, and its eigenfunctions establish intrinsic coordinates along which the dynamics behave linearly. Previous studies have used finite-dimensional approximations of the Koopman operator for model-predictive control approaches. In this work, we illustrate a fundamental closure issue of this approach and argue that it is beneficial to first validate eigenfunctions and then construct reduced-order models in these validated eigenfunctions. These coordinates form a Koopman-invariant subspace by design and, thus, have improved predictive power. We show then how the control can be formulated directly in these intrinsic coordinates and discuss potential benefits and caveats of this perspective. The resulting control architecture is termed Koopman Reduced Order Nonlinear Identification and Control (KRONIC). It is demonstrated that these eigenfunctions can be approximated with data-driven regression and power series expansions, based on the partial differential equation governing the infinitesimal generator of the Koopman operator. Validating discovered eigenfunctions is crucial and we show that lightly damped eigenfunctions may be faithfully extracted from EDMD or an implicit formulation. These lightly damped eigenfunctions are particularly relevant for control, as they correspond to nearly conserved quantities that are associated with persistent dynamics, such as the Hamiltonian. KRONIC is then demonstrated on a number of relevant examples, including 1) a nonlinear system with a known linear embedding, 2) a variety of Hamiltonian systems, and 3) a high-dimensional double-gyre model for ocean mixing.

...read moreread less

Posted Content•

Event-Triggered Communication and Control of Network Systems for Multi-Agent Consensus

[...]

Cameron Nowzari¹, Eloy Garcia, Jorge E. Cortes²•Institutions (2)

George Mason University¹, University of California, San Diego²

01 Dec 2017-arXiv: Optimization and Control

TL;DR: A comprehensive account of the motivations behind the use of event-triggered strategies for consensus, the methods for algorithm synthesis, the technical challenges involved in establishing desirable properties of the resulting implementations, and their applications in distributed control is provided.

...read moreread less

Abstract: This article provides an introduction to event-triggered coordination for multi-agent average consensus. We provide a comprehensive account of the motivations behind the use of event-triggered strategies for consensus, the methods for algorithm synthesis, the technical challenges involved in establishing desirable properties of the resulting implementations, and their applications in distributed control. We pay special attention to the assumptions on the capabilities of the network agents and the resulting features of the algorithm execution, including the interconnection topology, the evaluation of triggers, and the role of imperfect information. The issues raised in our discussion transcend the specific consensus problem and are indeed characteristic of cooperative algorithms for networked systems that solve other coordination tasks. As our discussion progresses, we make these connections clear, highlighting general challenges and tools to address them widespread in the event-triggered control of networked systems.

...read moreread less

Journal Article•DOI•

Best practices for comparing optimization algorithms

[...]

Vahid Beiranvand¹, Warren Hare¹, Yves Lucet¹•Institutions (1)

University of British Columbia¹

24 Sep 2017-arXiv: Optimization and Control

TL;DR: This paper systematically review the benchmarking process of optimization algorithms, and provides suggestions for each step of the comparison process and highlights the pitfalls to avoid when evaluating the performance of optimized algorithms.

...read moreread less

Abstract: Comparing, or benchmarking, of optimization algorithms is a complicated task that involves many subtle considerations to yield a fair and unbiased evaluation. In this paper, we systematically review the benchmarking process of optimization algorithms, and discuss the challenges of fair comparison. We provide suggestions for each step of the comparison process and highlight the pitfalls to avoid when evaluating the performance of optimization algorithms. We also discuss various methods of reporting the benchmarking results. Finally, some suggestions for future research are presented to improve the current benchmarking process.

...read moreread less

Posted Content•

Scaling Algorithms for Unbalanced Transport Problems

[...]

Lénaïc Chizat, Gabriel Peyré¹, Bernhard Schmitzer, François-Xavier Vialard²•Institutions (2)

École Normale Supérieure¹, Paris Dauphine University²

13 Jan 2017-arXiv: Optimization and Control

TL;DR: This article introduces a new class of fast algorithms to approx-imate variational problems involving unbalanced optimal transport, and shows how these methods can be used to solve unbalanced transport, unbalanced gradient flows, and to compute unbalanced barycenters.

...read moreread less

Abstract: This article introduces a new class of fast algorithms to approx-imate variational problems involving unbalanced optimal transport. While classical optimal transport considers only normalized probability distributions, it is important for many applications to be able to compute some sort of re-laxed transportation between arbitrary positive measures. A generic class of such “unbalanced” optimal transport problems has been recently proposed by several authors. In this paper, we show how to extend the, now classical, entropic regularization scheme to these unbalanced problems. This gives rise to fast, highly parallelizable algorithms that operate by performing only diagonal scaling (i.e. pointwise multiplications) of the transportation couplings. They are generalizations of the celebrated Sinkhorn algorithm. We show how these methods can be used to solve unbalanced transport, unbalanced gradient flows, and to compute unbalanced barycenters. We showcase applications to 2-D shape modification, color transfer, and growth models.

...read moreread less

Posted Content•

Exact Diffusion for Distributed Optimization and Learning --- Part I: Algorithm Development

[...]

Kun Yuan¹, Bicheng Ying¹, Xiaochuan Zhao², Ali H. Sayed³•Institutions (3)

University of California, Los Angeles¹, Goldman Sachs², École Polytechnique Fédérale de Lausanne³

16 Feb 2017-arXiv: Optimization and Control

TL;DR: The exact diffusion method is applicable to locally balanced left-stochastic combination matrices which, compared to the conventional doubly stochastic matrix, are more general and able to endow the algorithm with faster convergence rates, more flexible step-size choices, and improved privacy-preserving properties.

...read moreread less

Abstract: This work develops a distributed optimization strategy with guaranteed exact convergence for a broad class of left-stochastic combination policies. The resulting exact diffusion strategy is shown in Part II to have a wider stability range and superior convergence performance than the EXTRA strategy. The exact diffusion solution is applicable to non-symmetric left-stochastic combination matrices, while many earlier developments on exact consensus implementations are limited to doubly-stochastic matrices; these latter matrices impose stringent constraints on the network topology. The derivation of the exact diffusion strategy in this work relies on reformulating the aggregate optimization problem as a penalized problem and resorting to a diagonally-weighted incremental construction. Detailed stability and convergence analyses are pursued in Part II and are facilitated by examining the evolution of the error dynamics in a transformed domain. Numerical simulations illustrate the theoretical conclusions.

...read moreread less

Posted Content•

Momentum and Stochastic Momentum for Stochastic Gradient, Newton, Proximal Point and Subspace Descent Methods

[...]

Nicolas Loizou¹, Peter Richtárik²•Institutions (2)

Université de Montréal¹, King Abdullah University of Science and Technology²

27 Dec 2017-arXiv: Optimization and Control

TL;DR: In this paper, the authors study several classes of stochastic optimization algorithms enriched with heavy ball momentum and prove global nonassymptotic linear convergence rates for all methods and various measures of success.

...read moreread less

Abstract: In this paper we study several classes of stochastic optimization algorithms enriched with heavy ball momentum. Among the methods studied are: stochastic gradient descent, stochastic Newton, stochastic proximal point and stochastic dual subspace ascent. This is the first time momentum variants of several of these methods are studied. We choose to perform our analysis in a setting in which all of the above methods are equivalent. We prove global nonassymptotic linear convergence rates for all methods and various measures of success, including primal function values, primal iterates (in L2 sense), and dual function values. We also show that the primal iterates converge at an accelerated linear rate in the L1 sense. This is the first time a linear rate is shown for the stochastic heavy ball method (i.e., stochastic gradient descent method with momentum). Under somewhat weaker conditions, we establish a sublinear convergence rate for Cesaro averages of primal iterates. Moreover, we propose a novel concept, which we call stochastic momentum, aimed at decreasing the cost of performing the momentum step. We prove linear convergence of several stochastic methods with stochastic momentum, and show that in some sparse data regimes and for sufficiently small momentum parameters, these methods enjoy better overall complexity than methods with deterministic momentum. Finally, we perform extensive numerical testing on artificial and real datasets, including data coming from average consensus problems.

...read moreread less

Posted Content•

Newton-Type Methods for Non-Convex Optimization Under Inexact Hessian Information

[...]

Peng Xu¹, Farbod Roosta-Khorasani², Michael W. Mahoney³•Institutions (3)

Stanford University¹, University of Queensland², University of California, Berkeley³

23 Aug 2017-arXiv: Optimization and Control

TL;DR: The canonical problem of finite-sum minimization is considered, and appropriate uniform and non-uniform sub-sampling strategies are provided to construct such Hessian approximations, and optimal iteration complexity is obtained for the correspondingSub-sampled trust-region and adaptive cubic regularization methods.

...read moreread less

Abstract: We consider variants of trust-region and cubic regularization methods for non-convex optimization, in which the Hessian matrix is approximated. Under mild conditions on the inexact Hessian, and using approximate solution of the corresponding sub-problems, we provide iteration complexity to achieve $ \epsilon $-approximate second-order optimality which have shown to be tight. Our Hessian approximation conditions constitute a major relaxation over the existing ones in the literature. Consequently, we are able to show that such mild conditions allow for the construction of the approximate Hessian through various random sampling methods. In this light, we consider the canonical problem of finite-sum minimization, provide appropriate uniform and non-uniform sub-sampling strategies to construct such Hessian approximations, and obtain optimal iteration complexity for the corresponding sub-sampled trust-region and cubic regularization methods.

...read moreread less

Journal Article•DOI•

A Passivity-Based Approach to Nash Equilibrium Seeking over Networks

[...]

Dian Gadjov¹, Lacra Pavel¹•Institutions (1)

University of Toronto¹

06 May 2017-arXiv: Optimization and Control

TL;DR: In this paper, the authors consider the problem of distributed Nash equilibrium seeking over networks, a setting in which players have limited local information (i.e., instantaneous all-to-all player communication) and consider how to modify this gradient-play dynamics in the case of partial or networked information between players.

...read moreread less

Abstract: In this paper we consider the problem of distributed Nash equilibrium (NE) seeking over networks, a setting in which players have limited local information. We start from a continuous-time gradient-play dynamics that converges to an NE under strict monotonicity of the pseudo-gradient and assumes perfect information, i.e., instantaneous all-to-all player communication. We consider how to modify this gradient-play dynamics in the case of partial, or networked information between players. We propose an augmented gradient-play dynamics with correction in which players communicate locally only with their neighbours to compute an estimate of the other players' actions. We derive the new dynamics based on the reformulation as a multi-agent coordination problem over an undirected graph. We exploit incremental passivity properties and show that a synchronizing, distributed Laplacian feedback can be designed using relative estimates of the neighbours. Under a strict monotonicity property of the pseudo-gradient, we show that the augmented gradient-play dynamics converges to consensus on the NE of the game. We further discuss two cases that highlight the tradeoff between properties of the game and the communication graph.

...read moreread less

Journal Article•DOI•

A Novel Consensus-based Distributed Algorithm for Economic Dispatch Based on Local Estimation of Power Mismatch

[...]

Hajir Pourbabak¹, Jingwei Luo¹, Tao Chen¹, Wencong Su¹•Institutions (1)

University of Michigan¹

22 Aug 2017-arXiv: Optimization and Control

TL;DR: In this paper, a consensus-based distributed control algorithm for solving the economic dispatch problem of distributed generators is proposed, where a legacy central controller can be eliminated in order to avoid a single point of failure, relieve computational burden, maintain data privacy, and support plug-and-play functionalities.

...read moreread less

Abstract: This paper proposes a novel consensus-based distributed control algorithm for solving the economic dispatch problem of distributed generators. A legacy central controller can be eliminated in order to avoid a single point of failure, relieve computational burden, maintain data privacy, and support plug-and-play functionalities. The optimal economic dispatch is achieved by allowing the iterative coordination of local agents (consumers and distributed generators). As coordination information, the local estimation of power mismatch is shared among distributed generators through communication networks and does not contain any private information, ultimately contributing to a fair electricity market. Additionally, the proposed distributed algorithm is particularly designed for easy implementation and configuration of a large number of agents in which the distributed decision making can be implemented in a simple proportional-integral (PI) or integral (I) controller. In MATLAB/Simulink simulation, the accuracy of the proposed distributed algorithm is demonstrated in a 29-node system in comparison with the centralized algorithm. Scalability and a fast convergence rate are also demonstrated in a 1400-node case study. Further, the experimental test demonstrates the practical performance of the proposed distributed algorithm using the VOLTTRON platform and a cluster of low-cost credit-card-size single-board PCs.

...read moreread less

Journal Article•DOI•

Data-Driven Sparse Sensor Placement for Reconstruction

[...]

Krithika Manohar, Bingni W. Brunton, J. Nathan Kutz, Steven L. Brunton

26 Jan 2017-arXiv: Optimization and Control

TL;DR: In this paper, the singular value decomposition and QR pivoting are used to find sparse point sensors for signal reconstruction in high-dimensional high-bandwidth systems, and a tailored library of features extracted from training data is used.

...read moreread less

Abstract: Optimal sensor placement is a central challenge in the design, prediction, estimation, and control of high-dimensional systems. High-dimensional states can often leverage a latent low-dimensional representation, and this inherent compressibility enables sparse sensing. This article explores optimized sensor placement for signal reconstruction based on a tailored library of features extracted from training data. Sparse point sensors are discovered using the singular value decomposition and QR pivoting, which are two ubiquitous matrix computations that underpin modern linear dimensionality reduction. Sparse sensing in a tailored basis is contrasted with compressed sensing, a universal signal recovery method in which an unknown signal is reconstructed via a sparse representation in a universal basis. Although compressed sensing can recover a wider class of signals, we demonstrate the benefits of exploiting known patterns in data with optimized sensing. In particular, drastic reductions in the required number of sensors and improved reconstruction are observed in examples ranging from facial images to fluid vorticity fields. Principled sensor placement may be critically enabling when sensors are costly and provides faster state estimation for low-latency, high-bandwidth control. MATLAB code is provided for all examples.

...read moreread less

Journal Article•DOI•

Sparse identification of nonlinear dynamics for model predictive control in the low-data limit

[...]

Eurika Kaiser, J. Nathan Kutz, Steven L. Brunton

15 Nov 2017-arXiv: Optimization and Control

TL;DR: In this article, the authors extend the sparse identification of nonlinear dynamics (SINDY) modeling procedure to include the effects of actuation and demonstrate the ability of these models to enhance the performance of model predictive control (MPC) based on limited, noisy data.

...read moreread less

Abstract: The data-driven discovery of dynamics via machine learning is currently pushing the frontiers of modeling and control efforts, and it provides a tremendous opportunity to extend the reach of model predictive control. However, many leading methods in machine learning, such as neural networks, require large volumes of training data, may not be interpretable, do not easily include known constraints and symmetries, and often do not generalize beyond the attractor where models are trained. These factors limit the use of these techniques for the online identification of a model in the low-data limit, for example following an abrupt change to the system dynamics. In this work, we extend the recent sparse identification of nonlinear dynamics (SINDY) modeling procedure to include the effects of actuation and demonstrate the ability of these models to enhance the performance of model predictive control (MPC), based on limited, noisy data. SINDY models are parsimonious, identifying the fewest terms in the model needed to explain the data, making them interpretable, generalizable, and reducing the burden of training data. We show that the resulting SINDY-MPC framework has higher performance, requires significantly less data, and is more computationally efficient and robust to noise than neural network models, making it viable for online training and execution in response to rapid changes to the system. SINDY-MPC also shows improved performance over linear data-driven models, although linear models may provide a stopgap until enough data is available for SINDY. SINDY-MPC is demonstrated on a variety of dynamical systems with different challenges, including the chaotic Lorenz system, a simple model for flight control of an F8 aircraft, and an HIV model incorporating drug treatment.

...read moreread less

Posted Content•

Grid-forming Control for Power Converters based on Matching of Synchronous Machines

[...]

Taouba Jouini, Catalin Arghir, Florian Dörfler

28 Jun 2017-arXiv: Optimization and Control

TL;DR: A novel grid-forming converter control strategy which dwells upon the main characteristic of a SM: the presence of an internal rotating magnetic field is proposed, and a virtual oscillator is augmented whose frequency is driven by the DC-side voltage measurement and which sets the converter pulse-width-modulation signal.

...read moreread less

Abstract: We consider the problem of grid-forming control of power converters in low-inertia power systems. Starting from an average-switch three-phase inverter model, we draw parallels to a synchronous machine (SM) model and propose a novel grid-forming converter control strategy which dwells upon the main characteristic of a SM: the presence of an internal rotating magnetic field. In particular, we augment the converter system with a virtual oscillator whose frequency is driven by the DC-side voltage measurement and which sets the converter pulse-width-modulation signal, thereby achieving exact matching between the converter in closed-loop and the SM dynamics. We then provide a sufficient condition assuring existence, uniqueness, and global asymptotic stability of equilibria in a coordinate frame attached to the virtual oscillator angle. By actuating the DC-side input of the converter we are able to enforce this sufficient condition. In the same setting, we highlight strict incremental passivity, droop, and power-sharing properties of the proposed framework, which are compatible with conventional requirements of power system operation. We subsequently adopt disturbance decoupling techniques to design additional control loops that regulate the DC-side voltage, as well as AC-side frequency and amplitude, while in the end validating them with numerical experiments.

...read moreread less

Posted Content•

Online Convex Optimization with Time-Varying Constraints

[...]

Michael J. Neely, Hao Yu

15 Feb 2017-arXiv: Optimization and Control

TL;DR: An online algorithm is developed that solves the problem with O(1/\epsilon^2)$ convergence time in the special case when all constraint functions are nonpositive over a common subset of $\mathbb{R}^n$.

...read moreread less

Abstract: This paper considers online convex optimization with time-varying constraint functions. Specifically, we have a sequence of convex objective functions $\{f_t(x)\}_{t=0}^{\infty}$ and convex constraint functions $\{g_{t,i}(x)\}_{t=0}^{\infty}$ for $i \in \{1, ..., k\}$. The functions are gradually revealed over time. For a given $\epsilon>0$, the goal is to choose points $x_t$ every step $t$, without knowing the $f_t$ and $g_{t,i}$ functions on that step, to achieve a time average at most $\epsilon$ worse than the best fixed-decision that could be chosen with hindsight, subject to the time average of the constraint functions being nonpositive. It is known that this goal is generally impossible. This paper develops an online algorithm that solves the problem with $O(1/\epsilon^2)$ convergence time in the special case when all constraint functions are nonpositive over a common subset of $\mathbb{R}^n$. Similar performance is shown in an expected sense when the common subset assumption is removed but the constraint functions are assumed to vary according to a random process that is independent and identically distributed (i.i.d.) over time slots $t \in \{0, 1, 2, \ldots\}$. Finally, in the special case when both the constraint and objective functions are i.i.d. over time slots $t$, the algorithm is shown to come within $\epsilon$ of optimality with respect to the best (possibly time-varying) causal policy that knows the full probability distribution.

...read moreread less

Posted Content•

Second-Order Optimization for Non-Convex Machine Learning: An Empirical Study

[...]

Peng Xu¹, Farbod Roosta-Khorasani², Michael W. Mahoney³•Institutions (3)

University of Electronic Science and Technology of China¹, University of Queensland², University of California, Berkeley³

25 Aug 2017-arXiv: Optimization and Control

TL;DR: Detailed empirical evaluations of a class of Newton-type methods, namely sub-sampled variants of trust region (TR) and adaptive regularization with cubics (ARC) algorithms, for non-convex ML problems demonstrate that these methods not only can be computationally competitive with hand-tuned SGD with momentum, obtaining comparable or better generalization performance, but also they are highly robust to hyper-parameter settings.

...read moreread less

Abstract: While first-order optimization methods such as stochastic gradient descent (SGD) are popular in machine learning (ML), they come with well-known deficiencies, including relatively-slow convergence, sensitivity to the settings of hyper-parameters such as learning rate, stagnation at high training errors, and difficulty in escaping flat regions and saddle points. These issues are particularly acute in highly non-convex settings such as those arising in neural networks. Motivated by this, there has been recent interest in second-order methods that aim to alleviate these shortcomings by capturing curvature information. In this paper, we report detailed empirical evaluations of a class of Newton-type methods, namely sub-sampled variants of trust region (TR) and adaptive regularization with cubics (ARC) algorithms, for non-convex ML problems. In doing so, we demonstrate that these methods not only can be computationally competitive with hand-tuned SGD with momentum, obtaining comparable or better generalization performance, but also they are highly robust to hyper-parameter settings. Further, in contrast to SGD with momentum, we show that the manner in which these Newton-type methods employ curvature information allows them to seamlessly escape flat regions and saddle points.

...read moreread less

Journal Article•DOI•

Total Variation Denoising via the Moreau Envelope

[...]

Ivan Selesnick¹•Institutions (1)

New York University¹

02 Jan 2017-arXiv: Optimization and Control

TL;DR: This letter describes a generalization of this cost function that can yield more accurate estimation of piecewise constant signals and involves a nonconvex penalty (regularizer) designed to maintain the convexity of the cost function.

...read moreread less

Abstract: Total variation denoising is a nonlinear filtering method well suited for the estimation of piecewise-constant signals observed in additive white Gaussian noise. The method is defined by the minimization of a particular non-differentiable convex cost function. This paper describes a generalization of this cost function that can yield more accurate estimation of piecewise constant signals. The new cost function involves a non-convex penalty (regularizer) designed to maintain the convexity of the cost function. The new penalty is based on the Moreau envelope. The proposed total variation denoising method can be implemented using forward-backward splitting.

...read moreread less

Posted Content•

Nonsmooth analysis and optimization

[...]

Christian Clason

14 Aug 2017-arXiv: Optimization and Control

TL;DR: In this paper, generalized derivative concepts useful in deriving necessary optimality conditions and numerical algorithms for non-differentiable optimization problems in inverse problems, imaging, and PDE-constrained optimization are discussed.

...read moreread less

Abstract: These lecture notes for a graduate course cover generalized derivative concepts useful in deriving necessary optimality conditions and numerical algorithms for nondifferentiable optimization problems in inverse problems, imaging, and PDE-constrained optimization. Treated are convex functions and subdifferentials, Fenchel duality, monotone operators and resolvents, Moreau--Yosida regularization, proximal point and (some) first-order splitting methods, Clarke subdifferentials, and semismooth Newton methods. The required background from functional analysis and calculus of variations is also briefly summarized.

...read moreread less

Collapse