Showing papers on "Convex optimization published in 2015"

PDF

Open Access

Proceedings Article•

Adam: A Method for Stochastic Optimization

[...]

Diederik P. Kingma¹, Jimmy Ba²•Institutions (2)

University of Amsterdam¹, University of Toronto²

01 Jan 2015

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Abstract: We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

...read moreread less

111,197 citations

Book•

Convex Optimization: Algorithms and Complexity

[...]

Sébastien Bubeck¹•Institutions (1)

Microsoft¹

28 Oct 2015

TL;DR: This monograph presents the main complexity theorems in convex optimization and their corresponding algorithms and provides a gentle introduction to structural optimization with FISTA, saddle-point mirror prox, Nemirovski's alternative to Nesterov's smoothing, and a concise description of interior point methods.

...read moreread less

Abstract: This monograph presents the main complexity theorems in convex optimization and their corresponding algorithms. Starting from the fundamental theory of black-box optimization, the material progresses towards recent advances in structural optimization and stochastic optimization. Our presentation of black-box optimization, strongly influenced by the seminal book of Nesterov, includes the analysis of cutting plane methods, as well as accelerated gradient descent schemes. We also pay special attention to non-Euclidean settings relevant algorithms include Frank-Wolfe, mirror descent, and dual averaging and discuss their relevance in machine learning. We provide a gentle introduction to structural optimization with FISTA to optimize a sum of a smooth and a simple non-smooth term, saddle-point mirror prox Nemirovski's alternative to Nesterov's smoothing, and a concise description of interior point methods. In stochastic optimization we discuss stochastic gradient descent, mini-batches, random coordinate descent, and sublinear algorithms. We also briefly touch upon convex relaxation of combinatorial problems and the use of randomness to round solutions, as well as random walks based methods.

...read moreread less

1,213 citations

Book•

Convex Optimization Algorithms

[...]

Dimitri P. Bertsekas

10 Feb 2015

TL;DR: This book aims to provide a history of Balkan literature from 1989 to the present day in the context of the conflicts of the 1990s and beyond.

...read moreread less

Abstract: c 2015 Dimitri P. Bertsekas All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher.

...read moreread less

648 citations

Journal Article•DOI•

Distributed convex optimization via continuous-time coordination algorithms with discrete-time communication

[...]

Solmaz S. Kia¹, Jorge E. Cortes², Sonia Martinez²•Institutions (2)

University of California, Irvine¹, University of California, San Diego²

01 May 2015-Automatica

TL;DR: The exponential convergence of the proposed algorithm under strongly connected and weight-balanced digraph topologies when the local costs are strongly convex with globally Lipschitz gradients is established, and an upper bound on the stepsize is provided that guarantees exponential convergence over connected graphs for implementations with periodic communication.

...read moreread less

543 citations

Journal Article•DOI•

Phase Retrieval via Matrix Completion

[...]

Emmanuel J. Candès¹, Yonina C. Eldar, Thomas Strohmer², Vladislav Voroninski³•Institutions (3)

Stanford University¹, University of California, Davis², University of California, Berkeley³

08 May 2015-Siam Review

TL;DR: This paper develops a novel framework for phase retrieval, a problem which arises in X-ray crystallography, diffraction imaging, astronomical imaging, and many other applications, and combines multiple structured illuminations together with ideas from convex programming to recover the phase from intensity measurements.

...read moreread less

Abstract: This paper develops a novel framework for phase retrieval, a problem which arises in X-ray crystallography, diffraction imaging, astronomical imaging, and many other applications. Our approach, cal...

...read moreread less

533 citations

Posted Content•

Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey.

[...]

Dimitri P. Bertsekas

03 Jul 2015-arXiv: Systems and Control

TL;DR: A unified algorithmic framework is introduced for incremental methods for minimizing a sum P m=1 fi(x) consisting of a large number of convex component functions fi, including the advantages offered by randomization in the selection of components.

...read moreread less

Abstract: We survey incremental methods for minimizing a sum P m=1 fi(x) consisting of a large number of convex component functions fi. Our methods consist of iterations applied to single components, and have proved very effective in practice. We introduce a unified algorithmic framework for a variety of such methods, some involving gradient and subgradient iterations, which are known, and some involving combinations of subgradient and proximal methods, which are new and offer greater flexibility in exploiting the special structure of fi. We provide an analysis of the convergence and rate of convergence properties of these methods, including the advantages offered by randomization in the selection of components. We also survey applications in inference/machine learning, signal processing, and large-scale and distributed optimization.

...read moreread less

444 citations

Proceedings Article•

Accelerated proximal gradient methods for nonconvex programming

[...]

Huan Li¹, Zhouchen Lin¹•Institutions (1)

Shanghai Jiao Tong University¹

07 Dec 2015

TL;DR: This paper is the first to provide APG-type algorithms for general nonconvex and nonsmooth problems ensuring that every accumulation point is a critical point, and the convergence rates remain O(1/k2) when the problems are convex.

...read moreread less

Abstract: Nonconvex and nonsmooth problems have recently received considerable attention in signal/image processing, statistics and machine learning. However, solving the nonconvex and nonsmooth optimization problems remains a big challenge. Accelerated proximal gradient (APG) is an excellent method for convex programming. However, it is still unknown whether the usual APG can ensure the convergence to a critical point in nonconvex programming. In this paper, we extend APG for general nonconvex and nonsmooth programs by introducing a monitor that satisfies the sufficient descent property. Accordingly, we propose a monotone APG and a nonmonotone APG. The latter waives the requirement on monotonic reduction of the objective function and needs less computation in each iteration. To the best of our knowledge, we are the first to provide APG-type algorithms for general nonconvex and nonsmooth problems ensuring that every accumulation point is a critical point, and the convergence rates remain O(1/k2) when the problems are convex, in which k is the number of iterations. Numerical results testify to the advantage of our algorithms in speed.

...read moreread less

359 citations

Posted Content•

Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization

[...]

Xiangru Lian¹, Yijun Huang¹, Yuncheng Li¹, Ji Liu¹•Institutions (1)

University of Rochester¹

27 Jun 2015-arXiv: Optimization and Control

TL;DR: In this article, two asynchronous parallel implementations of stochastic gradient (SG) have been studied on the computer network and the shared memory system, and it was shown that the linear speedup is achievable if the number of workers is bounded by the total number of iterations.

...read moreread less

Abstract: Asynchronous parallel implementations of stochastic gradient (SG) have been broadly used in solving deep neural network and received many successes in practice recently. However, existing theories cannot explain their convergence and speedup properties, mainly due to the nonconvexity of most deep learning formulations and the asynchronous parallel mechanism. To fill the gaps in theory and provide theoretical supports, this paper studies two asynchronous parallel implementations of SG: one is on the computer network and the other is on the shared memory system. We establish an ergodic convergence rate $O(1/\sqrt{K})$ for both algorithms and prove that the linear speedup is achievable if the number of workers is bounded by $\sqrt{K}$ ($K$ is the total number of iterations). Our results generalize and improve existing analysis for convex minimization.

...read moreread less

346 citations

Journal Article•DOI•

Playing with Duality: An overview of recent primal?dual approaches for solving large-scale optimization problems

[...]

Nikos Komodakis¹, Jean-Christophe Pesquet¹•Institutions (1)

University of Paris¹

14 Oct 2015-IEEE Signal Processing Magazine

TL;DR: In this article, the authors present the principles of primal?dual approaches while providing an overview of the numerical methods that have been proposed in different contexts, including convex analysis, discrete optimization, parallel processing, and nonsmooth optimization with an emphasis on sparsity issues.

...read moreread less

Abstract: Optimization methods are at the core of many problems in signal/image processing, computer vision, and machine learning. For a long time, it has been recognized that looking at the dual of an optimization problem may drastically simplify its solution. However, deriving efficient strategies that jointly bring into play the primal and dual problems is a more recent idea that has generated many important new contributions in recent years. These novel developments are grounded in the recent advances in convex analysis, discrete optimization, parallel processing, and nonsmooth optimization with an emphasis on sparsity issues. In this article, we aim to present the principles of primal?dual approaches while providing an overview of the numerical methods that have been proposed in different contexts. Last but not least, primal?dual methods lead to algorithms that are easily parallelizable. Today, such parallel algorithms are becoming increasingly important for efficiently handling high-dimensional problems.

...read moreread less

316 citations

Journal Article•DOI•

Phase retrieval from coded diffraction patterns

[...]

Emmanuel J. Candès¹, Xiaodong Li², Mahdi Soltanolkotabi¹•Institutions (2)

Stanford University¹, University of Pennsylvania²

01 Sep 2015-Applied and Computational Harmonic Analysis

TL;DR: In this paper, the phase information of an object was recovered from intensity-only measurements, a problem which naturally appears in X-ray crystallography and related disciplines, where one can modulate the signal of interest and then collect the intensity of its diffraction pattern.

...read moreread less

310 citations

Journal Article•DOI•

A Survey of Linear Parameter-Varying Control Applications Validated by Experiments or High-Fidelity Simulations

[...]

Christian Hoffmann¹, Herbert Werner¹•Institutions (1)

Hamburg University of Technology¹

01 Mar 2015-IEEE Transactions on Control Systems and Technology

TL;DR: This paper provides a survey of results in linear parameter-varying (LPV) control that have been validated by experiments and/or high-fidelity simulations.

...read moreread less

Abstract: This paper provides a survey of results in linear parameter-varying (LPV) control that have been validated by experiments and/or high-fidelity simulations. The LPV controller synthesis techniques employed in the references of this survey are briefly reviewed and compared. The methods are classified into polytopic, linear fractional transformation, and gridding-based techniques and it is reviewed how in each of these approaches, synthesis can be carried out as a convex optimization problem via a finite number of linear matrix inequalities (LMIs) for both parameter-independent and parameter-dependent Lyapunov functions. The literature is categorized with regard to the application, the complexity induced by the controlled system’s dynamic and scheduling orders, as well as the synthesis method. Exemplary cases dealing with specific control design problems are presented in more detail to point control engineers to possible approaches that have been successfully applied. Furthermore, key publications in LPV control are related to application achievements on a timeline.

...read moreread less

Journal Article•DOI•

Universal gradient methods for convex optimization problems

[...]

Yu. Nesterov¹•Institutions (1)

Catholic University of Leuven¹

01 Aug 2015-Mathematical Programming

TL;DR: New methods for black-box convex minimization are presented, which demonstrate that the fast rate of convergence, typical for the smooth optimization problems, sometimes can be achieved even on nonsmooth problem instances.

...read moreread less

Abstract: In this paper, we present new methods for black-box convex minimization. They do not need to know in advance the actual level of smoothness of the objective function. Their only essential input parameter is the required accuracy of the solution. At the same time, for each particular problem class they automatically ensure the best possible rate of convergence. We confirm our theoretical results by encouraging numerical experiments, which demonstrate that the fast rate of convergence, typical for the smooth optimization problems, sometimes can be achieved even on nonsmooth problem instances.

...read moreread less

Journal Article•DOI•

RSS-Based Localization in Wireless Sensor Networks Using Convex Relaxation: Noncooperative and Cooperative Schemes

[...]

Slavisa Tomic, Marko Beko, Rui Dinis

01 May 2015-IEEE Transactions on Vehicular Technology

TL;DR: New approaches based on convex optimization to address the received signal strength (RSS)-based noncooperative and cooperative localization problems in wireless sensor networks (WSNs) are proposed by using an array of passive anchor nodes to collect noisy RSS measurements from radiating source nodes in WSNs.

...read moreread less

Abstract: In this paper, we propose new approaches based on convex optimization to address the received signal strength (RSS)-based noncooperative and cooperative localization problems in wireless sensor networks (WSNs). By using an array of passive anchor nodes, we collect the noisy RSS measurements from radiating source nodes in WSNs, which we use to estimate the source positions. We derive the maximum likelihood (ML) estimator, since the ML-based solutions have particular importance due to their asymptotically optimal performance. However, the ML estimator requires the minimization of a nonconvex objective function that may have multiple local optima, thus making the search for the globally optimal solution hard. To overcome this difficulty, we derive a new nonconvex estimator, which tightly approximates the ML estimator for small noise. Then, the new estimator is relaxed by applying efficient convex relaxations that are based on second-order cone programming and semidefinite programming in the case of noncooperative and cooperative localization, respectively, for both cases of known and unknown source transmit power. We also show that our approaches work well in the case when the source transmit power and the path loss exponent are simultaneously unknown at the anchor nodes. Moreover, we show that the generalization of the new approaches for the localization problem in indoor environments is straightforward. Simulation results show that the proposed approaches significantly improve the localization accuracy, reducing the estimation error between 15% and 20% on average, compared with the existing approaches.

...read moreread less

Journal Article•DOI•

A Second-Order Multi-Agent Network for Bound-Constrained Distributed Optimization

[...]

Qingshan Liu¹, Jun Wang²•Institutions (2)

Huazhong University of Science and Technology¹, Dalian University of Technology²

27 Mar 2015-IEEE Transactions on Automatic Control

TL;DR: This technical note presents a second-order multi-agent network for distributed optimization with a sum of convex objective functions subject to bound constraints that is capable of solving more general constrained distributed optimization problems.

...read moreread less

Abstract: This technical note presents a second-order multi-agent network for distributed optimization with a sum of convex objective functions subject to bound constraints. In the multi-agent network, the agents connect each others locally as an undirected graph and know only their own objectives and constraints. The multi-agent network is proved to be able to reach consensus to the optimal solution under mild assumptions. Moreover, the consensus of the multi-agent network is converted to the convergence of a dynamical system, which is proved using the Lyapunov method. Compared with existing multi-agent networks for optimization, the second-order multi-agent network herein is capable of solving more general constrained distributed optimization problems. Simulation results on two numerical examples are presented to substantiate the performance and characteristics of the multi-agent network.

...read moreread less

Proceedings Article•DOI•

Network Lasso: Clustering and Optimization in Large Graphs

[...]

David Hallac¹, Jure Leskovec¹, Stephen Boyd¹•Institutions (1)

Stanford University¹

10 Aug 2015

TL;DR: The network lasso is introduced, a generalization of the group lasso to a network setting that allows for simultaneous clustering and optimization on graphs and an algorithm based on the Alternating Direction Method of Multipliers (ADMM) to solve this problem in a distributed and scalable manner.

...read moreread less

Abstract: Convex optimization is an essential tool for modern data analysis, as it provides a framework to formulate and solve many problems in machine learning and data mining. However, general convex optimization solvers do not scale well, and scalable solvers are often specialized to only work on a narrow class of problems. Therefore, there is a need for simple, scalable algorithms that can solve many common optimization problems. In this paper, we introduce the network lasso, a generalization of the group lasso to a network setting that allows for simultaneous clustering and optimization on graphs. We develop an algorithm based on the Alternating Direction Method of Multipliers (ADMM) to solve this problem in a distributed and scalable manner, which allows for guaranteed global convergence even on large graphs. We also examine a non-convex extension of this approach. We then demonstrate that many types of problems can be expressed in our framework. We focus on three in particular --- binary classification, predicting housing prices, and event detection in time series data --- comparing the network lasso to baseline approaches and showing that it is both a fast and accurate method of solving large optimization problems.

...read moreread less

Journal Article•DOI•

Simultaneously Structured Models With Application to Sparse and Low-Rank Matrices

[...]

Samet Oymak¹, Amin Jalali², Maryam Fazel², Yonina C. Eldar³, Babak Hassibi⁴ - Show less +1 more•Institutions (4)

University of California, Berkeley¹, University of Washington², Technion – Israel Institute of Technology³, California Institute of Technology⁴

09 Feb 2015-IEEE Transactions on Information Theory

TL;DR: This framework applies to arbitrary structure-inducing norms as well as to a wide range of measurement ensembles, and allows us to give sample complexity bounds for problems such as sparse phase retrieval and low-rank tensor completion.

...read moreread less

Abstract: Recovering structured models (e.g., sparse or group-sparse vectors, low-rank matrices) given a few linear observations have been well-studied recently. In various applications in signal processing and machine learning, the model of interest is structured in several ways, for example, a matrix that is simultaneously sparse and low rank. Often norms that promote the individual structures are known, and allow for recovery using an orderwise optimal number of measurements (e.g., $\ell _{1}$ norm for sparsity, nuclear norm for matrix rank). Hence, it is reasonable to minimize a combination of such norms. We show that, surprisingly, using multiobjective optimization with these norms can do no better, orderwise, than exploiting only one of the structures, thus revealing a fundamental limitation in sample complexity. This result suggests that to fully exploit the multiple structures, we need an entirely new convex relaxation. Further, specializing our results to the case of sparse and low-rank matrices, we show that a nonconvex formulation recovers the model from very few measurements (on the order of the degrees of freedom), whereas the convex problem combining the $\ell _{1}$ and nuclear norms requires many more measurements, illustrating a gap between the performance of the convex and nonconvex recovery problems. Our framework applies to arbitrary structure-inducing norms as well as to a wide range of measurement ensembles. This allows us to give sample complexity bounds for problems such as sparse phase retrieval and low-rank tensor completion.

...read moreread less

Journal Article•DOI•

Massive MIMO in Real Propagation Environments: Do All Antennas Contribute Equally?

[...]

Xiang Gao¹, Ove Edfors¹, Fredrik Tufvesson¹, Erik G. Larsson²•Institutions (2)

Lund University¹, Linköping University²

29 Jul 2015-IEEE Transactions on Communications

TL;DR: A substantial reduction in the number of RF chains can be achieved for real massive MIMO channels, without significant performance loss, by performing antenna selection using simple algorithms.

...read moreread less

Abstract: Massive MIMO can greatly increase both spectral and transmit-energy efficiency. This is achieved by allowing the number of antennas and RF chains to grow very large. However, the challenges include high system complexity and hardware energy consumption. Here we investigate the possibilities to reduce the required number of RF chains, by performing antenna selection. While this approach is not a very effective strategy for theoretical independent Rayleigh fading channels, a substantial reduction in the number of RF chains can be achieved for real massive MIMO channels, without significant performance loss. We evaluate antenna selection performance on measured channels at 2.6 GHz, using a linear and a cylindrical array, both having 128 elements. Sum-rate maximization is used as the criterion for antenna selection. A selection scheme based on convex optimization is nearly optimal and used as a benchmark. The achieved sum-rate is compared with that of a very simple scheme that selects the antennas with the highest received power. The power-based scheme gives performance close to the convex optimization scheme, for the measured channels. This observation indicates a potential for significant reductions of massive MIMO implementation complexity, by reducing the number of RF chains and performing antenna selection using simple algorithms.

...read moreread less

Journal Article•DOI•

Optimal Dimensioning and Power Management of a Fuel Cell/Battery Hybrid Bus via Convex Programming

[...]

Xiaosong Hu¹, Nikolce Murgovski¹, Lars Johannesson¹, Bo Egardt¹•Institutions (1)

Chalmers University of Technology¹

01 Feb 2015-IEEE-ASME Transactions on Mechatronics

TL;DR: In this article, convex programming is extended to rapidly and efficiently optimize both the power management strategy and sizes of the fuel cell system (FCS) and the battery pack in the hybrid bus.

...read moreread less

Abstract: This paper is concerned with the simultaneous optimal component sizing and power management of a fuel cell/battery hybrid bus. Existing studies solve the combined plant/controller optimization problem for fuel cell hybrid vehicles (FCHVs) by using methods with disadvantages of heavy computational burden and/or suboptimality, for which only a single driving profile was often considered. This paper adds three important contributions to the FCHVs-related literature. First, convex programming is extended to rapidly and efficiently optimize both the power management strategy and sizes of the fuel cell system (FCS) and the battery pack in the hybrid bus. The main purpose is to encourage more researchers and engineers in FCHVs field to utilize the new effective tool. Second, the influence of the driving pattern on the optimization result (both the component sizes and hydrogen economy) of the bus is systematically investigated by considering three different bus driving routes, including two standard testing cycles and a realistic bus line cycle with slope information in Gothenburg, Sweden. Finally, the sensitivity of the optimization outcome to the potential price decreases of the FCS and the battery is quantitatively examined.

...read moreread less

Journal Article•DOI•

Distribution Locational Marginal Pricing Through Quadratic Programming for Congestion Management in Distribution Networks

[...]

Shaojun Huang¹, Qiuwei Wu¹, Shmuel S. Oren², Ruoyang Li², Zhaoxi Liu¹ - Show less +1 more•Institutions (2)

Technical University of Denmark¹, University of California, Berkeley²

01 Jul 2015-IEEE Transactions on Power Systems

TL;DR: In this article, the authors proposed a distribution locational marginal pricing (DLMP) method through quadratic programming (QP) designed to alleviate the congestion that might occur in a distribution network with high penetration of flexible demands.

...read moreread less

Abstract: This paper presents the distribution locational marginal pricing (DLMP) method through quadratic programming (QP) designed to alleviate the congestion that might occur in a distribution network with high penetration of flexible demands. In the DLMP method, the distribution system operator (DSO) calculates dynamic tariffs and publishes them to the aggregators, who make the optimal energy plans for the flexible demands. The DLMP through QP instead of linear programing as studied in previous literatures solves the multiple solution issue of the aggregator optimization which may cause the decentralized congestion management by DLMP to fail. It is proven in this paper, using convex optimization theory, the aggregator's optimization problem through QP is strictly convex and has a unique solution. The Karush-Kuhn-Tucker (KKT) conditions and the unique solution of the aggregator optimization ensure that the centralized DSO optimization and the decentralized aggregator optimization converge. Case studies using a distribution network with high penetration of electric vehicles (EVs) and heat pumps (HPs) validate the equivalence of the two optimization setups, and the efficacy of the proposed DLMP through QP for congestion management.

...read moreread less

Proceedings Article•

Asynchronous parallel stochastic gradient for nonconvex optimization

[...]

Xiangru Lian¹, Yijun Huang¹, Yuncheng Li¹, Ji Liu¹•Institutions (1)

University of Rochester¹

07 Dec 2015

TL;DR: An ergodic convergence rate is established for both asynchronous parallel implementations of stochastic gradient and it is proved that the linear speedup is achievable if the number of workers is bounded by $\sqrt{K}$ ($K$ is the total number of iterations).

...read moreread less

Abstract: Asynchronous parallel implementations of stochastic gradient (SG) have been broadly used in solving deep neural network and received many successes in practice recently. However, existing theories cannot explain their convergence and speedup properties, mainly due to the nonconvexity of most deep learning formulations and the asynchronous parallel mechanism. To fill the gaps in theory and provide theoretical supports, this paper studies two asynchronous parallel implementations of SG: one is over a computer network and the other is on a shared memory system. We establish an ergodic convergence rate O(1/ √K) for both algorithms and prove that the linear speedup is achievable if the number of workers is bounded by √K (K is the total number of iterations). Our results generalize and improve existing analysis for convex minimization.

...read moreread less

Journal Article•DOI•

On the complexity analysis of randomized block-coordinate descent methods

[...]

Zhaosong Lu¹, Lin Xiao²•Institutions (2)

Simon Fraser University¹, Microsoft²

01 Aug 2015-Mathematical Programming

TL;DR: This paper extends Nesterov’s technique for analyzing the RBCD method for minimizing a smooth convex function over a block-separable closed convex set to obtain a sharper expected-value type of convergence rate than the one implied in Richtárik and Takáč (Math Program 144(1–2):1–38, 2014).

...read moreread less

Abstract: In this paper we analyze the randomized block-coordinate descent (RBCD) methods proposed in Nesterov (SIAM J Optim 22(2):341---362, 2012), Richtarik and Takaa? (Math Program 144(1---2):1---38, 2014) for minimizing the sum of a smooth convex function and a block-separable convex function, and derive improved bounds on their convergence rates. In particular, we extend Nesterov's technique developed in Nesterov (SIAM J Optim 22(2):341---362, 2012) for analyzing the RBCD method for minimizing a smooth convex function over a block-separable closed convex set to the aforementioned more general problem and obtain a sharper expected-value type of convergence rate than the one implied in Richtarik and Takaa? (Math Program 144(1---2):1---38, 2014). As a result, we also obtain a better high-probability type of iteration complexity. In addition, for unconstrained smooth convex minimization, we develop a new technique called randomized estimate sequence to analyze the accelerated RBCD method proposed by Nesterov (SIAM J Optim 22(2):341---362, 2012) and establish a sharper expected-value type of convergence rate than the one given in Nesterov (SIAM J Optim 22(2):341---362, 2012).

...read moreread less

Journal Article•DOI•

Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning

[...]

Julien Mairal

12 May 2015-Siam Journal on Optimization

TL;DR: In this article, an incremental majorization-minimization scheme for minimizing a large sum of continuous functions is proposed, where the upper bounds approximate the objective up to a smooth error; such upper bounds are called first-order surrogate functions.

...read moreread less

Abstract: Majorization-minimization algorithms consist of successively minimizing a sequence of upper bounds of the objective function. These upper bounds are tight at the current estimate, and each iteration monotonically drives the objective function downhill. Such a simple principle is widely applicable and has been very popular in various scientific fields, especially in signal processing and statistics. We propose an incremental majorization-minimization scheme for minimizing a large sum of continuous functions, a problem of utmost importance in machine learning. We present convergence guarantees for nonconvex and convex optimization when the upper bounds approximate the objective up to a smooth error; we call such upper bounds “first-order surrogate functions.” More precisely, we study asymptotic stationary point guarantees for nonconvex problems, and for convex ones, we provide convergence rates for the expected objective function value. We apply our scheme to composite optimization and obtain a new incremental proximal gradient algorithm with linear convergence rate for strongly convex functions. Our experiments show that our method is competitive with the state of the art for solving machine learning problems such as logistic regression when the number of training samples is large enough, and we demonstrate its usefulness for sparse estimation with nonconvex penalties.

...read moreread less

Journal Article•DOI•

An Inertial Forward-Backward Algorithm for Monotone Inclusions

[...]

Dirk A. Lorenz¹, Thomas Pock²•Institutions (2)

Braunschweig University of Technology¹, Austrian Institute of Technology²

01 Feb 2015-Journal of Mathematical Imaging and Vision

TL;DR: In this article, an inertial forward-backward splitting algorithm is proposed to compute a zero of the sum of two monotone operators, with one of the two operators being co-coercive.

...read moreread less

Abstract: In this paper, we propose an inertial forward-backward splitting algorithm to compute a zero of the sum of two monotone operators, with one of the two operators being co-coercive. The algorithm is inspired by the accelerated gradient method of Nesterov, but can be applied to a much larger class of problems including convex-concave saddle point problems and general monotone inclusions. We prove convergence of the algorithm in a Hilbert space setting and show that several recently proposed first-order methods can be obtained as special cases of the general algorithm. Numerical results show that the proposed algorithm converges faster than existing methods, while keeping the computational cost of each iteration basically unchanged.

...read moreread less

Journal Article•DOI•

Splitting Methods for Convex Clustering

[...]

Eric C. Chi¹, Kenneth Lange²•Institutions (2)

Rice University¹, University of California, Los Angeles²

10 Dec 2015-Journal of Computational and Graphical Statistics

TL;DR: This work presents two splitting methods for solving the convex clustering problem, an instance of the alternating direction method of multipliers (ADMM) and the alternating minimization algorithm (AMA), and demonstrates the performance of the ADMM and AMA on both simulated and real data examples.

...read moreread less

Abstract: Clustering is a fundamental problem in many scientific applications. Standard methods such as k-means, Gaussian mixture models, and hierarchical clustering, however, are beset by local minima, which are sometimes drastically suboptimal. Recently introduced convex relaxations of k-means and hierarchical clustering shrink cluster centroids toward one another and ensure a unique global minimizer. In this work, we present two splitting methods for solving the convex clustering problem. The first is an instance of the alternating direction method of multipliers (ADMM); the second is an instance of the alternating minimization algorithm (AMA). In contrast to previously considered algorithms, our ADMM and AMA formulations provide simple and unified frameworks for solving the convex clustering problem under the previously studied norms and open the door to potentially novel norms. We demonstrate the performance of our algorithm on both simulated and real data examples. While the differences between the two algori...

...read moreread less

Posted Content•

Exact tensor completion using t-SVD

[...]

Zemin Zhang¹, Shuchin Aeron¹•Institutions (1)

Tufts University¹

16 Feb 2015-arXiv: Learning

TL;DR: In this paper, a tensor-Singular Value Decomposition (t-SVD) based tensor tubal rank decomposition is proposed to recover multidimensional arrays from limited sampling.

...read moreread less

Abstract: In this paper we focus on the problem of completion of multidimensional arrays (also referred to as tensors) from limited sampling. Our approach is based on a recently proposed tensor-Singular Value Decomposition (t-SVD) [1]. Using this factorization one can derive notion of tensor rank, referred to as the tensor tubal rank, which has optimality properties similar to that of matrix rank derived from SVD. As shown in [2] some multidimensional data, such as panning video sequences exhibit low tensor tubal rank and we look at the problem of completing such data under random sampling of the data cube. We show that by solving a convex optimization problem, which minimizes the tensor nuclear norm obtained as the convex relaxation of tensor tubal rank, one can guarantee recovery with overwhelming probability as long as samples in proportion to the degrees of freedom in t-SVD are observed. In this sense our results are order-wise optimal. The conditions under which this result holds are very similar to the incoherency conditions for the matrix completion, albeit we define incoherency under the algebraic set-up of t-SVD. We show the performance of the algorithm on some real data sets and compare it with other existing approaches based on tensor flattening and Tucker decomposition.

...read moreread less

Posted Content•

Energy-Efficient Resource Allocation for Wireless Powered Communication Networks

[...]

Qingqing Wu¹, Meixia Tao¹, Derrick Wing Kwan Ng², Wen Chen¹, Robert Schober³ - Show less +1 more•Institutions (3)

Shanghai Jiao Tong University¹, University of New South Wales², University of Erlangen-Nuremberg³

17 Nov 2015-arXiv: Information Theory

TL;DR: This paper considers a wireless powered communication network (WPCN), where multiple users harvest energy from a dedicated power station and then communicate with an information receiving station, and shows that the EE maximization problem for the WPCN can be cast intoEE maximization problems for two simplified networks via exploiting its special structure.

...read moreread less

Abstract: This paper considers a wireless powered communication network (WPCN), where multiple users harvest energy from a dedicated power station and then communicate with an information receiving station. Our goal is to investigate the maximum achievable energy efficiency (EE) of the network via joint time allocation and power control while taking into account the initial battery energy of each user. We first study the EE maximization problem in the WPCN without any system throughput requirement. We show that the EE maximization problem for the WPCN can be cast into EE maximization problems for two simplified networks via exploiting its special structure. For each problem, we derive the optimal solution and provide the corresponding physical interpretation, despite the non-convexity of the problems. Subsequently, we study the EE maximization problem under a minimum system throughput constraint. Exploiting fractional programming theory, we transform the resulting non-convex problem into a standard convex optimization problem. This allows us to characterize the optimal solution structure of joint time allocation and power control and to derive an efficient iterative algorithm for obtaining the optimal solution. Simulation results verify our theoretical findings and demonstrate the effectiveness of the proposed joint time and power optimization.

...read moreread less

Journal Article•DOI•

Non-Stationary Stochastic Optimization

[...]

Omar Besbes¹, Yonatan Gur¹, Assaf Zeevi¹•Institutions (1)

Columbia University¹

14 Sep 2015-Operations Research

TL;DR: In this article, the authors consider a non-stationary variant of a sequential stochastic optimization problem, in which the underlying cost functions may change along the horizon and propose a measure, termed variation budget, that controls the extent of said change, and study how restrictions on this budget impact achievable performance.

...read moreread less

Abstract: We consider a non-stationary variant of a sequential stochastic optimization problem, in which the underlying cost functions may change along the horizon. We propose a measure, termed variation budget, that controls the extent of said change, and study how restrictions on this budget impact achievable performance. We identify sharp conditions under which it is possible to achieve long-run average optimality and more refined performance measures such as rate optimality that fully characterize the complexity of such problems. In doing so, we also establish a strong connection between two rather disparate strands of literature: (1) adversarial online convex optimization and (2) the more traditional stochastic approximation paradigm (couched in a non-stationary setting). This connection is the key to deriving well-performing policies in the latter, by leveraging structure of optimal policies in the former. Finally, tight bounds on the minimax regret allow us to quantify the “price of non-stationarity,” which ...

...read moreread less

Journal Article•DOI•

Fault Detection for T-S Fuzzy Time-Delay Systems: Delta Operator and Input-Output Methods

[...]

Hongyi Li¹, Yabin Gao¹, Ligang Wu², Hak-Keung Lam³•Institutions (3)

Bohai University¹, Harbin Institute of Technology², King's College London³

01 Feb 2015-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: By designing a filter to generate a residual signal, the fault detection problem addressed in this paper can be converted into a filtering problem and the time-varying delay is approximated by the two-term approximation method.

...read moreread less

Abstract: This paper focuses on the problem of fault detection for Takagi–Sugeno fuzzy systems with time-varying delays via delta operator approach. By designing a filter to generate a residual signal, the fault detection problem addressed in this paper can be converted into a filtering problem. The time-varying delay is approximated by the two-term approximation method. Fuzzy augmented fault detection system is constructed in $\delta $ -domain, and a threshold function is given. By applying the scaled small gain theorem and choosing a Lyapunov–Krasovskii functional in $\delta $ -domain, a sufficient condition of asymptotic stability with a prescribed $H_\infty $ disturbance attenuation level is derived for the proposed fault detection system. Then, a solvability condition for the designed fault detection filter is established, with which the desired filter can be obtained by solving a convex optimization problem. Finally, an example is given to demonstrate the feasibility and effectiveness of the proposed method.

...read moreread less

Journal Article•DOI•

On the Convergence of Alternating Minimization for Convex Programming with Applications to Iteratively Reweighted Least Squares and Decomposition Schemes

[...]

Amir Beck¹•Institutions (1)

Technion – Israel Institute of Technology¹

15 Jan 2015-Siam Journal on Optimization

TL;DR: This paper analyzes the convergence rate of the alternating minimization method and establishes a nonasymptotic sublinear rate of convergence where the multiplicative constant depends on the minimal block Lipschitz constant, and studies the convergence properties of a decomposition-based approach designed to solve convex problems involving sums of norms.

...read moreread less

Abstract: This paper is concerned with the alternating minimization (AM) method for solving convex minimization problems where the decision variables vector is split into two blocks. The objective function is a sum of a differentiable convex function and a separable (possibly) nonsmooth extended real-valued convex function, and consequently constraints can be incorporated. We analyze the convergence rate of the method and establish a nonasymptotic sublinear rate of convergence where the multiplicative constant depends on the minimal block Lipschitz constant. We then analyze the iteratively reweighted least squares (IRLS) method for solving convex problems involving sums of norms. Based on the results derived for the AM method, we establish a nonasymptotic sublinear rate of convergence of the IRLS method. In addition, we show an asymptotic rate of convergence whose efficiency estimate does not depend on the data of the problem. Finally, we study the convergence properties of a decomposition-based approach designed t...

...read moreread less

Journal Article•DOI•

Online Convex Optimization in Dynamic Environments

[...]

Eric C. Hall¹, Rebecca Willett²•Institutions (2)

Duke University¹, University of Wisconsin-Madison²

18 Feb 2015-IEEE Journal of Selected Topics in Signal Processing

TL;DR: A dynamic mirror descent framework is described which addresses the challenge of adapting to nonstationary environments arising in real-world problems, yielding low theoretical regret bounds and accurate, adaptive, and computationally efficient algorithms which are applicable to broad classes of problems.

...read moreread less

Abstract: High-velocity streams of high-dimensional data pose significant “big data” analysis challenges across a range of applications and settings. Online learning and online convex programming play a significant role in the rapid recovery of important or anomalous information from these large datastreams. While recent advances in online learning have led to novel and rapidly converging algorithms, these methods are unable to adapt to nonstationary environments arising in real-world problems. This paper describes a dynamic mirror descent framework which addresses this challenge, yielding low theoretical regret bounds and accurate, adaptive, and computationally efficient algorithms which are applicable to broad classes of problems. The methods are capable of learning and adapting to an underlying and possibly time-varying dynamical model. Empirical results in the context of dynamic texture analysis, solar flare detection, sequential compressed sensing of a dynamic scene, traffic surveillance, tracking self-exciting point processes and network behavior in the Enron email corpus support the core theoretical findings.

...read moreread less

Collapse