scispace - formally typeset
Search or ask a question

Showing papers on "Convex optimization published in 2019"


Proceedings ArticleDOI
15 Jun 2019
TL;DR: The objective is to learn feature embeddings that generalize well under a linear classification rule for novel categories and this work exploits two properties of linear classifiers: implicit differentiation of the optimality conditions of the convex problem and the dual formulation of the optimization problem.
Abstract: Many meta-learning approaches for few-shot learning rely on simple base learners such as nearest-neighbor classifiers. However, even in the few-shot regime, discriminatively trained linear predictors can offer better generalization. We propose to use these predictors as base learners to learn representations for few-shot learning and show they offer better tradeoffs between feature size and performance across a range of few-shot recognition benchmarks. Our objective is to learn feature embeddings that generalize well under a linear classification rule for novel categories. To efficiently solve the objective, we exploit two properties of linear classifiers: implicit differentiation of the optimality conditions of the convex problem and the dual formulation of the optimization problem. This allows us to use high-dimensional embeddings with improved generalization at a modest increase in computational overhead. Our approach, named MetaOptNet, achieves state-of-the-art performance on miniImageNet, tieredImageNet, CIFAR-FS, and FC100 few-shot learning benchmarks.

1,084 citations


Journal ArticleDOI
TL;DR: This paper derives a closed-form propulsion power consumption model for rotary-wing UAVs, and proposes a new path discretization method to transform the original problem into a discretized equivalent with a finite number of optimization variables, for which the proposed designs significantly outperform the benchmark schemes.
Abstract: This paper studies unmanned aerial vehicle (UAV)-enabled wireless communication, where a rotary-wing UAV is dispatched to communicate with multiple ground nodes (GNs). We aim to minimize the total UAV energy consumption, including both propulsion energy and communication related energy, while satisfying the communication throughput requirement of each GN. To this end, we first derive a closed-form propulsion power consumption model for rotary-wing UAVs, and then formulate the energy minimization problem by jointly optimizing the UAV trajectory and communication time allocation among GNs, as well as the total mission completion time. The problem is difficult to be optimally solved, as it is non-convex and involves infinitely many variables over time. To tackle this problem, we first consider the simple fly-hover-communicate design, where the UAV successively visits a set of hovering locations and communicates with one corresponding GN while hovering at each location. For this design, we propose an efficient algorithm to optimize the hovering locations and durations, as well as the flying trajectory connecting these hovering locations, by leveraging the travelling salesman problem with neighborhood and convex optimization techniques. Next, we consider the general case, where the UAV also communicates while flying. We propose a new path discretization method to transform the original problem into a discretized equivalent with a finite number of optimization variables, for which we obtain a high-quality suboptimal solution by applying the successive convex approximation technique. The numerical results show that the proposed designs significantly outperform the benchmark schemes.

1,043 citations


Book
10 Jul 2019
TL;DR: This website has lectures on convex optimization to read, not just read, however likewise download them and even read online, as well as obtain the data in the types of txt, zip, kindle, word, ppt, pdf, aswell as rar.
Abstract: Trying to find professional reading resources? We have lectures on convex optimization to read, not just read, however likewise download them and even read online. Locate this great book writtern by by now, just below, yeah just below. Obtain the data in the types of txt, zip, kindle, word, ppt, pdf, as well as rar. Once again, never miss out on to read online and also download this publication in our site here. Click the link. Are you looking to uncover lectures on convex optimization Digitalbook. Correct here it is possible to locate as well as download lectures on convex optimization Book. We've got ebooks for every single topic lectures on convex optimization accessible for download cost-free. Search the site also as find Jean Campbell eBook in layout. We also have a fantastic collection of information connected to this Digitalbook for you. As well because the best part is you could assessment as well as download for lectures on convex optimization eBook Our goal is always to offer you an assortment of cost-free ebooks too as aid resolve your troubles. We have got a considerable collection of totally free of expense Book for people from every single stroll of life. We have got tried our finest to gather a sizable library of preferred cost-free as well as paid files. GO TO THE TECHNICAL WRITING FOR AN EXPANDED TYPE OF THIS LECTURES ON CONVEX OPTIMIZATION, ALONG WITH A CORRECTLY FORMATTED VERSION OF THE INSTANCE MANUAL PAGE ABOVE.

776 citations


Posted Content
TL;DR: In this paper, the authors propose to use long-term memory of past gradients to solve the problem of Adam's convergence to an optimal solution in nonconvex settings.
Abstract: Several recently proposed stochastic optimization methods that have been successfully used in training deep networks such as RMSProp, Adam, Adadelta, Nadam are based on using gradient updates scaled by square roots of exponential moving averages of squared past gradients. In many applications, e.g. learning with large output spaces, it has been empirically observed that these algorithms fail to converge to an optimal solution (or a critical point in nonconvex settings). We show that one cause for such failures is the exponential moving average used in the algorithms. We provide an explicit example of a simple convex optimization setting where Adam does not converge to the optimal solution, and describe the precise problems with the previous analysis of Adam algorithm. Our analysis suggests that the convergence issues can be fixed by endowing such algorithms with `long-term memory' of past gradients, and propose new variants of the Adam algorithm which not only fix the convergence issues but often also lead to improved empirical performance.

504 citations


Proceedings Article
28 Oct 2019
TL;DR: This paper introduces disciplined parametrized programming, a subset of disciplined convex programming, and demonstrates how to efficiently differentiate through each of these components, allowing for end-to-end analytical differentiation through the entire convex program.
Abstract: Recent work has shown how to embed differentiable optimization problems (that is, problems whose solutions can be backpropagated through) as layers within deep learning architectures. This method provides a useful inductive bias for certain problems, but existing software for differentiable optimization layers is rigid and difficult to apply to new settings. In this paper, we propose an approach to differentiating through disciplined convex programs, a subclass of convex optimization problems used by domain-specific languages (DSLs) for convex optimization. We introduce disciplined parametrized programming, a subset of disciplined convex programming, and we show that every disciplined parametrized program can be represented as the composition of an affine map from parameters to problem data, a solver, and an affine map from the solver’s solution to a solution of the original problem (a new form we refer to as affine-solver-affine form). We then demonstrate how to efficiently differentiate through each of these components, allowing for end-to-end analytical differentiation through the entire convex program. We implement our methodology in version 1.1 of CVXPY, a popular Python-embedded DSL for convex optimization, and additionally implement differentiable layers for disciplined convex programs in PyTorch and TensorFlow 2.0. Our implementation significantly lowers the barrier to using convex optimization problems in differentiable programs. We present applications in linear machine learning models and in stochastic control, and we show that our layer is competitive (in execution time) compared to specialized differentiable solvers from past work.

346 citations


Proceedings Article
01 Jan 2019
TL;DR: In this article, the authors present a convex optimization framework to compute guaranteed upper bounds on the Lipschitz constant of DNNs both accurately and efficiently, where activation functions are interpreted as gradients of convex potential functions.
Abstract: Tight estimation of the Lipschitz constant for deep neural networks (DNNs) is useful in many applications ranging from robustness certification of classifiers to stability analysis of closed-loop systems with reinforcement learning controllers. Existing methods in the literature for estimating the Lipschitz constant suffer from either lack of accuracy or poor scalability. In this paper, we present a convex optimization framework to compute guaranteed upper bounds on the Lipschitz constant of DNNs both accurately and efficiently. Our main idea is to interpret activation functions as gradients of convex potential functions. Hence, they satisfy certain properties that can be described by quadratic constraints. This particular description allows us to pose the Lipschitz constant estimation problem as a semidefinite program (SDP). The resulting SDP can be adapted to increase either the estimation accuracy (by capturing the interaction between activation functions of different layers) or scalability (by decomposition and parallel implementation). We illustrate the utility of our approach with a variety of experiments on randomly generated networks and on classifiers trained on the MNIST and Iris datasets. In particular, we experimentally demonstrate that our Lipschitz bounds are the most accurate compared to those in the literature. We also study the impact of adversarial training methods on the Lipschitz bounds of the resulting classifiers and show that our bounds can be used to efficiently provide robustness guarantees.

269 citations


Proceedings Article
06 Jun 2019
TL;DR: In this paper, a theoretical framework for designing and understanding practical meta-learning methods that integrates sophisticated formalizations of task-similarity with the extensive literature on online convex optimization and sequential prediction algorithms is presented.
Abstract: We build a theoretical framework for designing and understanding practical meta-learning methods that integrates sophisticated formalizations of task-similarity with the extensive literature on online convex optimization and sequential prediction algorithms. Our approach enables the task-similarity to be learned adaptively, provides sharper transfer-risk bounds in the setting of statistical learning-to-learn, and leads to straightforward derivations of average-case regret bounds for efficient algorithms in settings where the task-environment changes dynamically or the tasks share a certain geometric structure. We use our theory to modify several popular meta-learning algorithms and improve their training and meta-test-time performance on standard problems in few-shot and federated learning.

227 citations


Proceedings Article
24 May 2019
TL;DR: The norm version of AdaGrad (AdaGrad-Norm) converges to a stationary point at the O(log(N)/ √ N) rate in the stochastic setting, and at the optimal O(1/N) rates in the batch (non-stochastic) setting – in this sense, the convergence guarantees are “sharp”.
Abstract: Adaptive gradient methods such as AdaGrad and its variants update the stepsize in stochastic gradient descent on the fly according to the gradients received along the way; such methods have gained widespread use in large-scale optimization for their ability to converge robustly, without the need to fine-tune the stepsize schedule. Yet, the theoretical guarantees to date for AdaGrad are for online and convex optimization. We bridge this gap by providing theoretical guarantees for the convergence of AdaGrad for smooth, nonconvex functions. We show that the norm version of AdaGrad (AdaGrad-Norm) converges to a stationary point at the $\mathcal{O}(\log(N)/\sqrt{N})$ rate in the stochastic setting, and at the optimal $\mathcal{O}(1/N)$ rate in the batch (non-stochastic) setting -- in this sense, our convergence guarantees are 'sharp'. In particular, the convergence of AdaGrad-Norm is robust to the choice of all hyper-parameters of the algorithm, in contrast to stochastic gradient descent whose convergence depends crucially on tuning the step-size to the (generally unknown) Lipschitz smoothness constant and level of stochastic noise on the gradient. Extensive numerical experiments are provided to corroborate our theory; moreover, the experiments suggest that the robustness of AdaGrad-Norm extends to state-of-the-art models in deep learning, without sacrificing generalization.

200 citations


Posted Content
TL;DR: This monograph introduces the basic concepts of Online Learning through a modern view of Online Convex Optimization, and presents first-order and second-order algorithms for online learning with convex losses, in Euclidean and non-Euclidean settings.
Abstract: In this monograph, I introduce the basic concepts of Online Learning through a modern view of Online Convex Optimization. Here, online learning refers to the framework of regret minimization under worst-case assumptions. I present first-order and second-order algorithms for online learning with convex losses, in Euclidean and non-Euclidean settings. All the algorithms are clearly presented as instantiation of Online Mirror Descent or Follow-The-Regularized-Leader and their variants. Particular attention is given to the issue of tuning the parameters of the algorithms and learning in unbounded domains, through adaptive and parameter-free online learning algorithms. Non-convex losses are dealt through convex surrogate losses and through randomization. The bandit setting is also briefly discussed, touching on the problem of adversarial and stochastic multi-armed bandits. These notes do not require prior knowledge of convex analysis and all the required mathematical tools are rigorously explained. Moreover, all the proofs have been carefully chosen to be as simple and as short as possible.

196 citations


Posted Content
TL;DR: This tutorial argues that Wasserstein distributionally robust optimization has interesting ramifications for statistical learning and motivates new approaches for fundamental learning tasks such as classification, regression, maximum likelihood estimation or minimum mean square error estimation, among others.
Abstract: Many decision problems in science, engineering and economics are affected by uncertain parameters whose distribution is only indirectly observable through samples. The goal of data-driven decision-making is to learn a decision from finitely many training samples that will perform well on unseen test samples. This learning task is difficult even if all training and test samples are drawn from the same distribution---especially if the dimension of the uncertainty is large relative to the training sample size. Wasserstein distributionally robust optimization seeks data-driven decisions that perform well under the most adverse distribution within a certain Wasserstein distance from a nominal distribution constructed from the training samples. In this tutorial we will argue that this approach has many conceptual and computational benefits. Most prominently, the optimal decisions can often be computed by solving tractable convex optimization problems, and they enjoy rigorous out-of-sample and asymptotic consistency guarantees. We will also show that Wasserstein distributionally robust optimization has interesting ramifications for statistical learning and motivates new approaches for fundamental learning tasks such as classification, regression, maximum likelihood estimation or minimum mean square error estimation, among others.

184 citations


Journal ArticleDOI
TL;DR: In this paper, the problem of generalized state estimation for an array of Markovian coupled networks under the round-Robin protocol and redundant channels is investigated by using an extended dissipative property.
Abstract: In this paper, the problem of generalized state estimation for an array of Markovian coupled networks under the round-Robin protocol (RRP) and redundant channels is investigated by using an extended dissipative property. The randomly varying coupling of the networks under consideration is governed by a Markov chain. With the aid of using the RRP, the transmission order of nodes is availably orchestrated. In this case, the probability of occurrence data collisions through a shared constrained network may be reduced. The redundant channels are also used in the signal transmission to deal with the frangibility of networks caused by a single channel in the networks. The network induced phenomena, that is, randomly occurring packet dropouts and randomly occurring quantization are fully considered. The main purpose of the research is to find a desired estimator design approach such that the extended $({\Omega _{1},\Omega _{2},\Omega _{3}) - \gamma }$ -stochastic dissipativity property of the estimation error system is guaranteed. In terms of the Lyapunov–Krasovskii methodology, the Kronecker product and an improved matrix decoupling approach, sufficient conditions for such an addressed problem are established by means of handling some convex optimization problems. Finally, the serviceability of the proposed method is explained by providing an illustrated example.

Journal ArticleDOI
TL;DR: In this article, the authors derive linear convergence rates of several first order methods for solving smooth non-strongly convex constrained optimization problems, i.e. involving an objective function with a Lipschitz continuous gradient that satisfies some relaxed strong convexity condition.
Abstract: The standard assumption for proving linear convergence of first order methods for smooth convex optimization is the strong convexity of the objective function, an assumption which does not hold for many practical applications. In this paper, we derive linear convergence rates of several first order methods for solving smooth non-strongly convex constrained optimization problems, i.e. involving an objective function with a Lipschitz continuous gradient that satisfies some relaxed strong convexity condition. In particular, in the case of smooth constrained convex optimization, we provide several relaxations of the strong convexity conditions and prove that they are sufficient for getting linear convergence for several first order methods such as projected gradient, fast gradient and feasible descent methods. We also provide examples of functional classes that satisfy our proposed relaxations of strong convexity conditions. Finally, we show that the proposed relaxed strong convexity conditions cover important applications ranging from solving linear systems, Linear Programming, and dual formulations of linearly constrained convex problems.

Journal ArticleDOI
TL;DR: In this paper, the authors exploit the multi-antenna non-orthogonal multiple access (NOMA) technique for multiuser computation offloading, such that different users can simultaneously offload their computation tasks to the multiple antenna BS over the same time/frequency resources, and the BS can employ successive interference cancellation (SIC) to efficiently decode all users' offloaded tasks for remote execution.
Abstract: This paper studies a multiuser mobile edge computing (MEC) system in which one base station (BS) serves multiple users with intensive computation tasks. We exploit the multi-antenna non-orthogonal multiple access (NOMA) technique for multiuser computation offloading, such that different users can simultaneously offload their computation tasks to the multi-antenna BS over the same time/frequency resources, and the BS can employ successive interference cancelation (SIC) to efficiently decode all users’ offloaded tasks for remote execution. In particular, we pursue energy-efficient MEC designs by considering two cases with partial and binary offloading, respectively. We aim to minimize the weighted sum-energy consumption at all users subject to their computation latency constraints, by jointly optimizing the communication and computation resource allocation as well as the BS’s decoding order for SIC. For the case with partial offloading, the weighted sum-energy minimization is a convex optimization problem, for which an efficient algorithm based on the Lagrange duality method is presented to obtain the globally optimal solution. For the case with binary offloading, the weighted sum-energy minimization corresponds to a mixed Boolean convex optimization problem that is generally more difficult to be solved. We first use the branch-and-bound (BnB) method to obtain the globally optimal solution and then develop two low-complexity algorithms based on the greedy method and the convex relaxation, respectively, to find suboptimal solutions with high quality in practice. Via numerical results, it is shown that the proposed NOMA-based computation offloading design significantly improves the energy efficiency of the multiuser MEC system as compared to other benchmark schemes. It is also shown that for the case with binary offloading, the proposed greedy method performs close to the optimal BnB-based solution, and the convex relaxation-based solution achieves a suboptimal performance but with lower implementation complexity.

Posted Content
TL;DR: A general theory of regularized Markov Decision Processes that generalizes these approaches in two directions: a larger class of regularizers, and the general modified policy iteration approach, encompassing both policy iteration and value iteration.
Abstract: Many recent successful (deep) reinforcement learning algorithms make use of regularization, generally based on entropy or Kullback-Leibler divergence. We propose a general theory of regularized Markov Decision Processes that generalizes these approaches in two directions: we consider a larger class of regularizers, and we consider the general modified policy iteration approach, encompassing both policy iteration and value iteration. The core building blocks of this theory are a notion of regularized Bellman operator and the Legendre-Fenchel transform, a classical tool of convex optimization. This approach allows for error propagation analyses of general algorithmic schemes of which (possibly variants of) classical algorithms such as Trust Region Policy Optimization, Soft Q-learning, Stochastic Actor Critic or Dynamic Policy Programming are special cases. This also draws connections to proximal convex optimization, especially to Mirror Descent.

Journal ArticleDOI
TL;DR: This paper studies data collection from a set of sensor nodes (SNs) in WSNs enabled by multiple unmanned aerial vehicles (UAVs), and proposes a simple scheme that each UAV only collects data while hovering, termed as hovering mode (Hmode).
Abstract: Energy consumption is one of the important design aspect for data collection in wireless sensor networks (WSNs). This paper studies data collection from a set of sensor nodes (SNs) in WSNs enabled by multiple unmanned aerial vehicles (UAVs). We aim to minimize the maximum mission completion time among all UAVs by jointly optimizing the UAV trajectory, as well as the wake-up scheduling and association for SNs, while ensuring that each SN can successfully upload the targeting amount of data with a given energy budget. The formulated problem is a non-convex problem which is difficult to be solved directly. To tackle this problem, we first propose a simple scheme that each UAV only collects data while hovering, termed as hovering mode (Hmode) . For this mode, in order to find the optimized hovering locations for each SN and the serving order among all locations, we propose an efficient algorithm by leveraging the min–max multiple Traveling Salesman Problem (min–max m-TSP) and convex optimization techniques. Furthermore, we propose the more general scheme that enables continuous data collection even while flying, termed as flying mode (Fmode) . By leveraging bisection method and time discretization technique, the original problem is transformed into a discretized equivalent with a finite number of optimization variables, based on which a Karush–Kuhn–Tucker (KKT) solution is obtained by applying the successive convex approximation (SCA) technique. The simulation results show that the proposed multi-UAV enabled data collection with joint trajectory and communication design achieves significant performance gains over the benchmark schemes.

Journal ArticleDOI
TL;DR: A new “system level” (SL) approach involving three complementary SL elements that provide an alternative to the Youla parameterization of all stabilizing controllers and the responses they achieve, and combine with SL constraints (SLCs) to parameterize the largest known class of constrained stabilization controllers that admit a convex characterization, generalizing quadratic invariance.
Abstract: Biological and advanced cyber-physical control systems often have limited, sparse, uncertain, and distributed communication and computing in addition to sensing and actuation. Fortunately, the corresponding plants and performance requirements are also sparse and structured, and this must be exploited to make constrained controller design feasible and tractable. We introduce a new “system level” (SL) approach involving three complementary SL elements. SL parameterizations (SLPs) provide an alternative to the Youla parameterization of all stabilizing controllers and the responses they achieve, and combine with SL constraints (SLCs) to parameterize the largest known class of constrained stabilizing controllers that admit a convex characterization, generalizing quadratic invariance. SLPs also lead to a generalization of detectability and stabilizability, suggesting the existence of a rich separation structure, that when combined with SLCs is naturally applicable to structurally constrained controllers and systems. We further provide a catalog of useful SLCs, most importantly including sparsity, delay, and locality constraints on both communication and computing internal to the controller, and external system performance. Finally, we formulate SL synthesis problems, which define the broadest known class of constrained optimal control problems that can be solved using convex programming.

Proceedings ArticleDOI
01 May 2019
TL;DR: Approximate Minima Perturbation is presented, a novel algorithm that can leverage any off-the-shelf optimizer and can be employed without any hyperparameter tuning, thus making it an attractive technique for practical deployment.
Abstract: Building useful predictive models often involves learning from sensitive data. Training models with differential privacy can guarantee the privacy of such sensitive data. For convex optimization tasks, several differentially private algorithms are known, but none has yet been deployed in practice. In this work, we make two major contributions towards practical differentially private convex optimization. First, we present Approximate Minima Perturbation, a novel algorithm that can leverage any off-the-shelf optimizer. We show that it can be employed without any hyperparameter tuning, thus making it an attractive technique for practical deployment. Second, we perform an extensive empirical evaluation of the state-of-the-art algorithms for differentially private convex optimization, on a range of publicly available benchmark datasets, and real-world datasets obtained through an industrial collaboration. We release open-source implementations of all the differentially private convex optimization algorithms considered, and benchmarks on as many as nine public datasets, four of which are high-dimensional.

Journal Article
TL;DR: It is shown that the Unadjusted Langevin Algorithm can be formulated as a first order optimization algorithm of an objective functional defined on the Wasserstein space of order $2$ and a non-asymptotic analysis of this method to sample from logconcave smooth target distribution is given.
Abstract: In this paper, we provide new insights on the Unadjusted Langevin Algorithm. We show that this method can be formulated as a first order optimization algorithm of an objective functional defined on the Wasserstein space of order $2$. Using this interpretation and techniques borrowed from convex optimization, we give a non-asymptotic analysis of this method to sample from logconcave smooth target distribution on $\mathbb{R}^d$. Based on this interpretation, we propose two new methods for sampling from a non-smooth target distribution, which we analyze as well. Besides, these new algorithms are natural extensions of the Stochastic Gradient Langevin Dynamics (SGLD) algorithm, which is a popular extension of the Unadjusted Langevin Algorithm. Similar to SGLD, they only rely on approximations of the gradient of the target log density and can be used for large-scale Bayesian inference.

Proceedings Article
01 Jan 2019
TL;DR: The approach builds on existing differentially private algorithms and relies on the analysis of algorithmic stability to ensure generalization and implies that, contrary to intuition based on private ERM, private SCO has asymptotically the same rate of $1/\sqrt{n}$ as non-private SCO in the parameter regime most common in practice.
Abstract: We study differentially private (DP) algorithms for stochastic convex optimization (SCO). In this problem the goal is to approximately minimize the population loss given i.i.d.~samples from a distribution over convex and Lipschitz loss functions. A long line of existing work on private convex optimization focuses on the empirical loss and derives asymptotically tight bounds on the excess empirical loss. However a significant gap exists in the known bounds for the population loss. We show that, up to logarithmic factors, the optimal excess population loss for DP algorithms is equal to the larger of the optimal non-private excess population loss, and the optimal excess empirical loss of DP algorithms. This implies that, contrary to intuition based on private ERM, private SCO has asymptotically the same rate of $1/\sqrt{n}$ as non-private SCO in the parameter regime most common in practice. The best previous result in this setting gives rate of $1/n^{1/4}$. Our approach builds on existing differentially private algorithms and relies on the analysis of algorithmic stability to ensure generalization.

Journal ArticleDOI
TL;DR: This paper addresses the problem of quantized feedback control of nonlinear Markov jump systems (MJSs) by constructing a Lyapunov function which depends both on mode information and fuzzy basis functions and derives the criterion which is able to ensure the stochastic stability with a predefined ${l_{2}-l_\infty }$ performance of the resulting closed-loop system.
Abstract: This paper addresses the problem of quantized feedback control of nonlinear Markov jump systems (MJSs). The nonlinear plant is represented by a class of fuzzy MJSs with time-varying delay based on a Takagi–Sugeno fuzzy model. The quantized signal is utilized for control purpose and the sector bound approach is exploited to deal with quantization errors. By constructing a Lyapunov function which depends both on mode information and fuzzy basis functions, the reciprocally convex approach is used to derive the criterion which is able to ensure the stochastic stability with a predefined ${l_{2}-l_\infty }$ performance of the resulting closed-loop system. The design of the quantized feedback controller is then converted to a convex optimization problem, which can be handled through the linear matrix inequality technique. Finally, a simulation example is presented to verify the effectiveness and practicability of the proposed new design techniques.

Book ChapterDOI
23 Aug 2019
TL;DR: Wasserstein distributionally robust optimization as mentioned in this paper aims to learn a decision from finitely many training samples that will perform well on unseen test samples by solving tractable convex optimization problems, and they enjoy rigorous out-of-sample and asymptotic consistency guarantees.
Abstract: Many decision problems in science, engineering and economics are affected by uncertain parameters whose distribution is only indirectly observable through samples. The goal of data-driven decision-making is to learn a decision from finitely many training samples that will perform well on unseen test samples. This learning task is difficult even if all training and test samples are drawn from the same distribution---especially if the dimension of the uncertainty is large relative to the training sample size. Wasserstein distributionally robust optimization seeks data-driven decisions that perform well under the most adverse distribution within a certain Wasserstein distance from a nominal distribution constructed from the training samples. In this tutorial we will argue that this approach has many conceptual and computational benefits. Most prominently, the optimal decisions can often be computed by solving tractable convex optimization problems, and they enjoy rigorous out-of-sample and asymptotic consistency guarantees. We will also show that Wasserstein distributionally robust optimization has interesting ramifications for statistical learning and motivates new approaches for fundamental learning tasks such as classification, regression, maximum likelihood estimation or minimum mean square error estimation, among others.

Journal ArticleDOI
Guannan Qu1, Na Li1
01 Jan 2019
TL;DR: In this article, the authors studied the global exponential stability of primal-dual gradient dynamics for convex optimization with strongly convex and smooth objectives and affine equality or inequality constraints.
Abstract: Continuous time primal-dual gradient dynamics (PDGD) that find a saddle point of a Lagrangian of an optimization problem have been widely used in systems and control. While the global asymptotic stability of such dynamics has been well-studied, it is less studied whether they are globally exponentially stable. In this letter, we study the PDGD for convex optimization with strongly convex and smooth objectives and affine equality or inequality constraints, and prove global exponential stability for such dynamics. Bounds on decaying rates are provided.

Journal ArticleDOI
TL;DR: In this paper, a wireless powered mobile edge computing (MEC) system with fluctuating channels and dynamic task arrivals over time is studied, and the authors jointly optimize the transmission energy allocation at the energy transmitter and the task allocation at user for local computing and offloading over a particular finite horizon, with the objective of minimizing the total transmission energy consumption at the ET while ensuring the user's successful task execution.
Abstract: This paper studies a wireless powered mobile edge computing (MEC) system with fluctuating channels and dynamic task arrivals over time. We jointly optimize the transmission energy allocation at the energy transmitter (ET) for WPT and the task allocation at the user for local computing and offloading over a particular finite horizon, with the objective of minimizing the total transmission energy consumption at the ET while ensuring the user's successful task execution. First, in order to characterize the fundamental performance limit, we consider the offline optimization by assuming that the perfect knowledge of channel state information and task state information (i.e., task arrival timing and amounts) is known a-priori. In this case, we obtain the well-structured optimal solution in a closed form to the energy minimization problem via convex optimization techniques. Next, inspired by the structured offline solutions obtained above, we develop heuristic online designs for the joint energy and task allocation when the knowledge of CSI/TSI is only causally known. Finally, numerical results are provided to show that the proposed joint designs achieve significantly smaller energy consumption than benchmark schemes with only local computing or full offloading at the user, and the proposed heuristic online designs perform close to the optimal offline solutions.

Journal ArticleDOI
TL;DR: In this article, the authors show that most of the results relating submodularity and convexity for set-functions can be extended to all submodular functions and provide a new interpretation of existing results for set functions.
Abstract: Submodular set-functions have many applications in combinatorial optimization, as they can be minimized and approximately maximized in polynomial time. A key element in many of the algorithms and analyses is the possibility of extending the submodular set-function to a convex function, which opens up tools from convex optimization. Submodularity goes beyond set-functions and has naturally been considered for problems with multiple labels or for functions defined on continuous domains, where it corresponds essentially to cross second-derivatives being nonpositive. In this paper, we show that most results relating submodularity and convexity for set-functions can be extended to all submodular functions. In particular, (a) we naturally define a continuous extension in a set of probability measures, (b) show that the extension is convex if and only if the original function is submodular, (c) prove that the problem of minimizing a submodular function is equivalent to a typically non-smooth convex optimization problem, and (d) propose another convex optimization problem with better computational properties (e.g., a smooth dual problem). Most of these extensions from the set-function situation are obtained by drawing links with the theory of multi-marginal optimal transport, which provides also a new interpretation of existing results for set-functions. We then provide practical algorithms to minimize generic submodular functions on discrete domains, with associated convergence rates.

Posted Content
TL;DR: Perturbed versions of GD and SGD are analyzed and it is shown that they are truly efficient---their dimension dependence is only polylogarithmic.
Abstract: Gradient descent (GD) and stochastic gradient descent (SGD) are the workhorses of large-scale machine learning. While classical theory focused on analyzing the performance of these methods in convex optimization problems, the most notable successes in machine learning have involved nonconvex optimization, and a gap has arisen between theory and practice. Indeed, traditional analyses of GD and SGD show that both algorithms converge to stationary points efficiently. But these analyses do not take into account the possibility of converging to saddle points. More recent theory has shown that GD and SGD can avoid saddle points, but the dependence on dimension in these analyses is polynomial. For modern machine learning, where the dimension can be in the millions, such dependence would be catastrophic. We analyze perturbed versions of GD and SGD and show that they are truly efficient---their dimension dependence is only polylogarithmic. Indeed, these algorithms converge to second-order stationary points in essentially the same time as they take to converge to classical first-order stationary points.

Journal ArticleDOI
TL;DR: Results showed that the proposed model is suitable for the short-term microgrids energy management system, and when compared to stochastic approaches, the proposed formulation proved to be more flexible and less time-consuming.
Abstract: This paper presents an energy management system (EMS) for single-phase or balanced three-phase microgrids via robust convex optimization. Along a finite planning horizon, the solution provided by the proposed microgrids EMS remains feasible under adverse conditions of random demands and renewable energy resources. The proposed model is represented as a convex mixed-integer second-order cone programming model. Two operation modes are considered: grid-connected and isolated. In grid-connected mode, the proposed EMS minimizes the costs of energy imports, dispatches of distributed generation (DG) units, and the operation of the energy storage systems. In isolated mode, the proposed EMS minimizes the unsupplied demand considering consumer priorities. Global robustness of the proposed mathematical model is adjusted using a single parameter $\zeta $ . The robustness of the solutions provided by the robust EMS are assessed using the Monte Carlo simulation method. In this case, DG units are set to operate in frequency and voltage droop control to support network fluctuations. Simulations are deployed using a microgrid with 136-nodes and several distributed energy resources. Results showed that the proposed model is suitable for the short-term microgrids energy management system. The robustness of the final solution was directly proportional to the operational costs, and it can be effectively controlled by the proposed parameter $\zeta $ for both operation modes. When compared to stochastic approaches, the proposed formulation proved to be more flexible and less time-consuming.

Journal ArticleDOI
TL;DR: This paper addresses the design and analysis of feedback-based online algorithms to control systems or networked systems based on performance objectives and engineering constraints that may evolve over time using the emerging time-varying convex optimization formalism.
Abstract: This paper addresses the design and analysis of feedback-based online algorithms to control systems or networked systems based on performance objectives and engineering constraints that may evolve over time. The emerging time-varying convex optimization formalism is leveraged to model optimal operational trajectories of the systems, as well as explicit local and network-level operational constraints. Departing from existing batch and feed-forward optimization approaches, the design of the algorithms capitalizes on an online implementation of primal-dual projected-gradient methods; the gradient steps are, however, suitably modified to accommodate feedback from the system in the form of measurements, hence, the term “online optimization with feedback.” By virtue of this approach, the resultant algorithms can cope with model mismatches in the algebraic representation of the system states and outputs, they avoid pervasive measurements of exogenous inputs, and they naturally lend themselves to a distributed implementation. Under suitable assumptions, analytical convergence claims are established in terms of dynamic regret. Furthermore, when the synthesis of the feedback-based online algorithms is based on a regularized Lagrangian function, $\boldsymbol{Q}$ -linear convergence to solutions of the time-varying optimization problem is shown.

Posted Content
TL;DR: CRAIG as discussed by the authors selects a weighted subset (or coreset) of training data that closely estimates the full gradient by maximizing a submodular function and shows that applying IG to this subset is guaranteed to converge to the (near)optimal solution with the same convergence rate as that of convex optimization.
Abstract: Incremental gradient (IG) methods, such as stochastic gradient descent and its variants are commonly used for large scale optimization in machine learning. Despite the sustained effort to make IG methods more data-efficient, it remains an open question how to select a training data subset that can theoretically and practically perform on par with the full dataset. Here we develop CRAIG, a method to select a weighted subset (or coreset) of training data that closely estimates the full gradient by maximizing a submodular function. We prove that applying IG to this subset is guaranteed to converge to the (near)optimal solution with the same convergence rate as that of IG for convex optimization. As a result, CRAIG achieves a speedup that is inversely proportional to the size of the subset. To our knowledge, this is the first rigorous method for data-efficient training of general machine learning models. Our extensive set of experiments show that CRAIG, while achieving practically the same solution, speeds up various IG methods by up to 6x for logistic regression and 3x for training deep neural networks.

Journal ArticleDOI
TL;DR: Numerical tests in fog computation offloading tasks corroborate that the proposed BanSaP approach offers competitive performance relative to existing approaches that are based on gradient feedback.
Abstract: This paper deals with online convex optimization involving both time-varying loss functions, and time-varying constraints. The loss functions are not fully accessible to the learner, and instead only the function values (also known as bandit feedback) are revealed at queried points. The constraints are revealed after making decisions, and can be instantaneously violated, yet they must be satisfied in the long term. This setting fits nicely the emerging online network tasks such as fog computing in the Internet-of-Things, where online decisions must flexibly adapt to the changing user preferences (loss functions), and the temporally unpredictable availability of resources (constraints). Tailored for such human-in-the-loop systems where the loss functions are hard to model, a family of online bandit saddle-point (BanSaP) schemes are developed, which adaptively adjust the online operations based on (possibly multiple) bandit feedback of the loss functions, and the changing environment. Performance here is assessed by: 1) dynamic regret that generalizes the widely used static regret and 2) fit that captures the accumulated amount of constraint violations. Specifically, BanSaP is proved to simultaneously yield sublinear dynamic regret and fit, provided that the best dynamic solutions vary slowly over time. Numerical tests in fog computation offloading tasks corroborate that our proposed BanSaP approach offers competitive performance relative to existing approaches that are based on gradient feedback.

Journal ArticleDOI
TL;DR: By transforming the filtering problem to a convex optimization one, conditions are presented to design the fuzzy reduced-order filter and two illustrative examples are used to verify the feasibility and applicability of the proposed design scheme.
Abstract: This paper is concerned with the problem of generalized $\mathcal {H}_{2}$ reduced-order filter design for continuous Takagi–Sugeno fuzzy systems using an event-triggered scheme. For a continuous Takagi–Sugeno fuzzy dynamic system, a reduced-order filter is designed to transform the original model into a linear lower order one. This filter can also approximate the original system with $\mathcal {H}_{2}$ performance, with a new type of event-triggered scheme used to decrease the communication loads and computation resources within the network. By transforming the filtering problem to a convex optimization one, conditions are presented to design the fuzzy reduced-order filter. Finally, two illustrative examples are used to verify the feasibility and applicability of the proposed design scheme.