Showing papers on "Rate of convergence published in 2017"

PDF

Open Access

Journal Article•DOI•

Random Gradient-Free Minimization of Convex Functions

[...]

Yurii Nesterov¹, Vladimir Spokoiny²•Institutions (2)

Catholic University of Leuven¹, Humboldt University of Berlin²

01 Apr 2017-Foundations of Computational Mathematics

TL;DR: New complexity bounds for methods of convex optimization based only on computation of the function value are proved, which appears that such methods usually need at most n times more iterations than the standard gradient methods, where n is the dimension of the space of variables.

...read moreread less

Abstract: In this paper, we prove new complexity bounds for methods of convex optimization based only on computation of the function value. The search directions of our schemes are normally distributed random Gaussian vectors. It appears that such methods usually need at most n times more iterations than the standard gradient methods, where n is the dimension of the space of variables. This conclusion is true for both nonsmooth and smooth problems. For the latter class, we present also an accelerated scheme with the expected rate of convergence $$O\Big ({n^2 \over k^2}\Big )$$O(n2k2), where k is the iteration counter. For stochastic optimization, we propose a zero-order scheme and justify its expected rate of convergence $$O\Big ({n \over k^{1/2}}\Big )$$O(nk1/2). We give also some bounds for the rate of convergence of the random gradient-free methods to stationary points of nonconvex functions, for both smooth and nonsmooth cases. Our theoretical results are supported by preliminary computational experiments.

...read moreread less

859 citations

Journal Article•DOI•

Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs

[...]

Angelia Nedic, Alex Olshevsky¹, Wei Shi²•Institutions (2)

Boston University¹, Arizona State University²

21 Dec 2017-Siam Journal on Optimization

TL;DR: This paper introduces a distributed algorithm, referred to as DIGing, based on a combination of a distributed inexact gradient method and a gradient tracking technique that converges to a global and consensual minimizer over time-varying graphs.

...read moreread less

Abstract: This paper considers the problem of distributed optimization over time-varying graphs. For the case of undirected graphs, we introduce a distributed algorithm, referred to as DIGing, based on a combination of a distributed inexact gradient method and a gradient tracking technique. The DIGing algorithm uses doubly stochastic mixing matrices and employs fixed step-sizes and, yet, drives all the agents' iterates to a global and consensual minimizer. When the graphs are directed, in which case the implementation of doubly stochastic mixing matrices is unrealistic, we construct an algorithm that incorporates the push-sum protocol into the DIGing structure, thus obtaining the Push-DIGing algorithm. Push-DIGing uses column stochastic matrices and fixed step-sizes, but it still converges to a global and consensual minimizer. Under the strong convexity assumption, we prove that the algorithms converge at R-linear (geometric) rates as long as the step-sizes do not exceed some upper bounds. We establish explicit est...

...read moreread less

795 citations

Journal Article•DOI•

Minimizing finite sums with the stochastic average gradient

[...]

Mark Schmidt¹, Nicolas Le Roux, Francis Bach²•Institutions (2)

University of British Columbia¹, École Normale Supérieure²

01 Mar 2017-Mathematical Programming

TL;DR: In this paper, the stochastic average gradient (SAG) method is used to optimize the sum of a finite number of smooth convex functions, which achieves a faster convergence rate than black-box SG methods.

...read moreread less

Abstract: We analyze the stochastic average gradient (SAG) method for optimizing the sum of a finite number of smooth convex functions. Like stochastic gradient (SG) methods, the SAG method's iteration cost is independent of the number of terms in the sum. However, by incorporating a memory of previous gradient values the SAG method achieves a faster convergence rate than black-box SG methods. The convergence rate is improved from $$O(1/\sqrt{k})$$O(1/k) to O(1 / k) in general, and when the sum is strongly-convex the convergence rate is improved from the sub-linear O(1 / k) to a linear convergence rate of the form $$O(\rho ^k)$$O(?k) for $$\rho < 1$$?<1. Further, in many cases the convergence rate of the new method is also faster than black-box deterministic gradient methods, in terms of the number of gradient evaluations. This extends our earlier work Le Roux et al. (Adv Neural Inf Process Syst, 2012), which only lead to a faster rate for well-conditioned strongly-convex problems. Numerical experiments indicate that the new algorithm often dramatically outperforms existing SG and deterministic gradient methods, and that the performance may be further improved through the use of non-uniform sampling strategies.

...read moreread less

769 citations

Journal Article•DOI•

On the linear convergence of the alternating direction method of multipliers

[...]

Mingyi Hong¹, Zhi-Quan Luo²•Institutions (2)

Iowa State University¹, The Chinese University of Hong Kong²

01 Mar 2017-Mathematical Programming

TL;DR: This paper establishes the global R-linear convergence of the ADMM for minimizing the sum of any number of convex separable functions, assuming that a certain error bound condition holds true and the dual stepsize is sufficiently small.

...read moreread less

Abstract: We analyze the convergence rate of the alternating direction method of multipliers (ADMM) for minimizing the sum of two or more nonsmooth convex separable functions subject to linear constraints. Previous analysis of the ADMM typically assumes that the objective function is the sum of only two convex functions defined on two separable blocks of variables even though the algorithm works well in numerical experiments for three or more blocks. Moreover, there has been no rate of convergence analysis for the ADMM without strong convexity in the objective function. In this paper we establish the global R-linear convergence of the ADMM for minimizing the sum of any number of convex separable functions, assuming that a certain error bound condition holds true and the dual stepsize is sufficiently small. Such an error bound condition is satisfied for example when the feasible set is a compact polyhedron and the objective function consists of a smooth strictly convex function composed with a linear mapping, and a nonsmooth $$\ell _1$$l1 regularizer. This result implies the linear convergence of the ADMM for contemporary applications such as LASSO without assuming strong convexity of the objective function.

...read moreread less

705 citations

Journal Article•DOI•

Error Analysis of a Finite Difference Method on Graded Meshes for a Time-Fractional Diffusion Equation

[...]

Martin Stynes, Eugene O'Riordan, José Luis Gracia

25 Apr 2017-SIAM Journal on Numerical Analysis

TL;DR: The final convergence result shows clearly how the regularity of the solution and the grading of the mesh affect the order of convergence of the difference scheme, so one can choose an optimal mesh grading.

...read moreread less

Abstract: A reaction-diffusion problem with a Caputo time derivative of order $\alpha\in (0,1)$ is considered. The solution of such a problem is shown in general to have a weak singularity near the initial time $t=0$, and sharp pointwise bounds on certain derivatives of this solution are derived. A new analysis of a standard finite difference method for the problem is given, taking into account this initial singularity. This analysis encompasses both uniform meshes and meshes that are graded in time, and includes new stability and consistency bounds. The final convergence result shows clearly how the regularity of the solution and the grading of the mesh affect the order of convergence of the difference scheme, so one can choose an optimal mesh grading. Numerical results are presented that confirm the sharpness of the error analysis.

...read moreread less

573 citations

Proceedings Article•

How to escape saddle points efficiently

[...]

Chi Jin¹, Rong Ge², Praneeth Netrapalli³, Sham M. Kakade⁴, Michael I. Jordan¹ - Show less +1 more•Institutions (4)

University of California, Berkeley¹, Duke University², Microsoft³, University of Washington⁴

06 Aug 2017

TL;DR: In this article, the authors show that perturbed gradient descent can escape saddle points almost for free, in a number of iterations which depends only poly-logarithmically on dimension.

...read moreread less

Abstract: This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension (i.e., it is almost "dimension-free"). The convergence rate of this procedure matches the well-known convergence rate of gradient descent to first-order stationary points, up to log factors. When all saddle points are non-degenerate, all second-order stationary points are local minima, and our result thus shows that perturbed gradient descent can escape saddle points almost for free. Our results can be directly applied to many machine learning applications, including deep learning. As a particular concrete example of such an application, we show that our results can be used directly to establish sharp global convergence rates for matrix factorization. Our results rely on a novel characterization of the geometry around saddle points, which may be of independent interest to the non-convex optimization community.

...read moreread less

280 citations

Journal Article•DOI•

On the Bipartite Consensus for Generic Linear Multiagent Systems With Input Saturation

[...]

Jiahu Qin¹, Weiming Fu¹, Wei Xing Zheng², Huijun Gao³•Institutions (3)

University of Science and Technology of China¹, University of Sydney², Harbin Institute of Technology³

01 Aug 2017-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: If each agent is asymptotically null controllable with bounded controls and the interaction topology described by a signed digraph is structurally balanced and contains a spanning tree, then the semi-global bipartite consensus can be achieved for the linear multiagent system by a linear feedback controller with the control gain being designed via the low gain feedback technique.

...read moreread less

Abstract: The bipartite consensus problem for a group of homogeneous generic linear agents with input saturation under directed interaction topology is examined. It is established that if each agent is asymptotically null controllable with bounded controls and the interaction topology described by a signed digraph is structurally balanced and contains a spanning tree, then the semi-global bipartite consensus can be achieved for the linear multiagent system by a linear feedback controller with the control gain being designed via the low gain feedback technique. The convergence analysis of the proposed control strategy is performed by means of the Lyapunov method which can also specify the convergence rate. At last, the validity of the theoretical findings is demonstrated by two simulation examples.

...read moreread less

272 citations

Journal Article•DOI•

A Globally Convergent Algorithm for Nonconvex Optimization Based on Block Coordinate Update

[...]

Yangyang Xu¹, Wotao Yin²•Institutions (2)

University of Alabama¹, University of California, Los Angeles²

04 Feb 2017-Journal of Scientific Computing

TL;DR: In this article, an algorithm for non-convex optimization with global convergence to a critical point has been proposed, where the variables of the underlying problem are either treated as one block or multiple disjoint blocks.

...read moreread less

Abstract: Nonconvex optimization arises in many areas of computational science and engineering. However, most nonconvex optimization algorithms are only known to have local convergence or subsequence convergence properties. In this paper, we propose an algorithm for nonconvex optimization and establish its global convergence (of the whole sequence) to a critical point. In addition, we give its asymptotic convergence rate and numerically demonstrate its efficiency. In our algorithm, the variables of the underlying problem are either treated as one block or multiple disjoint blocks. It is assumed that each non-differentiable component of the objective function, or each constraint, applies only to one block of variables. The differentiable components of the objective function, however, can involve multiple blocks of variables together. Our algorithm updates one block of variables at a time by minimizing a certain prox-linear surrogate, along with an extrapolation to accelerate its convergence. The order of update can be either deterministically cyclic or randomly shuffled for each cycle. In fact, our convergence analysis only needs that each block be updated at least once in every fixed number of iterations. We show its global convergence (of the whole sequence) to a critical point under fairly loose conditions including, in particular, the Kurdyka–Łojasiewicz condition, which is satisfied by a broad class of nonconvex/nonsmooth applications. These results, of course, remain valid when the underlying problem is convex. We apply our convergence results to the coordinate descent iteration for non-convex regularized linear regression, as well as a modified rank-one residue iteration for nonnegative matrix factorization. We show that both applications have global convergence. Numerically, we tested our algorithm on nonnegative matrix and tensor factorization problems, where random shuffling clearly improves the chance to avoid low-quality local solutions.

...read moreread less

259 citations

Posted Content•

How to Escape Saddle Points Efficiently

[...]

Chi Jin¹, Rong Ge², Praneeth Netrapalli³, Sham M. Kakade⁴, Michael I. Jordan¹ - Show less +1 more•Institutions (4)

University of California, Berkeley¹, Duke University², Microsoft³, University of Washington⁴

02 Mar 2017-arXiv: Learning

TL;DR: This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension, which shows that perturbed gradient descent can escape saddle points almost for free.

...read moreread less

259 citations

Posted Content•

Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis

[...]

Maxim Raginsky¹, Alexander Rakhlin², Matus Telgarsky¹•Institutions (2)

University of Illinois at Urbana–Champaign¹, University of Pennsylvania²

13 Feb 2017-arXiv: Learning

TL;DR: The present work provides a nonasymptotic analysis in the context of non-convex learning problems, giving finite-time guarantees for SGLD to find approximate minimizers of both empirical and population risks.

...read moreread less

Abstract: Stochastic Gradient Langevin Dynamics (SGLD) is a popular variant of Stochastic Gradient Descent, where properly scaled isotropic Gaussian noise is added to an unbiased estimate of the gradient at each iteration. This modest change allows SGLD to escape local minima and suffices to guarantee asymptotic convergence to global minimizers for sufficiently regular non-convex objectives (Gelfand and Mitter, 1991). The present work provides a nonasymptotic analysis in the context of non-convex learning problems, giving finite-time guarantees for SGLD to find approximate minimizers of both empirical and population risks. As in the asymptotic setting, our analysis relates the discrete-time SGLD Markov chain to a continuous-time diffusion process. A new tool that drives the results is the use of weighted transportation cost inequalities to quantify the rate of convergence of SGLD to a stationary distribution in the Euclidean $2$-Wasserstein distance.

...read moreread less

251 citations

Proceedings Article•DOI•

Katyusha: the first direct acceleration of stochastic gradient methods

[...]

Zeyuan Allen-Zhu¹•Institutions (1)

Princeton University¹

19 Jun 2017

TL;DR: Katyusha as discussed by the authors is a primal-only stochastic gradient method with negative momentum on top of Nesterov's momentum that can be incorporated into a variance reduction based algorithm and speed it up.

...read moreread less

Abstract: Nesterov's momentum trick is famously known for accelerating gradient descent, and has been proven useful in building fast iterative algorithms. However, in the stochastic setting, counterexamples exist and prevent Nesterov's momentum from providing similar acceleration, even if the underlying problem is convex. We introduce Katyusha, a direct, primal-only stochastic gradient method to fix this issue. It has a provably accelerated convergence rate in convex (off-line) stochastic optimization. The main ingredient is Katyusha momentum, a novel "negative momentum" on top of Nesterov's momentum that can be incorporated into a variance-reduction based algorithm and speed it up. Since variance reduction has been successfully applied to a growing list of practical problems, our paper suggests that in each of such cases, one could potentially give Katyusha a hug.

...read moreread less

Journal Article•DOI•

Nonasymptotic convergence analysis for the unadjusted Langevin algorithm

[...]

Alain Durmus, Eric Moulines¹•Institutions (1)

École Polytechnique¹

01 Jun 2017-Annals of Applied Probability

TL;DR: In this article, a sampling technique based on the Euler discretization of the Langevin stochastic differential equation is studied, and for both constant and decreasing step sizes, non-asymptotic bounds for the convergence to stationarity in both total variation and Wasserstein distances are obtained.

...read moreread less

Abstract: Sampling distribution over high-dimensional state-space is a problem which has recently attracted a lot of research efforts; applications include Bayesian non-parametrics, Bayesian inverse problems and aggregation of estimators. All these problems boil down to sample a target distribution $\pi$ having a density \wrt\ the Lebesgue measure on $\mathbb{R}^d$, known up to a normalisation factor $x \mapsto \mathrm{e}^{-U(x)}/\int_{\mathbb{R}^d} \mathrm{e}^{-U(y)} \mathrm{d} y$ where $U$ is continuously differentiable and smooth. In this paper, we study a sampling technique based on the Euler discretization of the Langevin stochastic differential equation. Contrary to the Metropolis Adjusted Langevin Algorithm (MALA), we do not apply a Metropolis-Hastings correction. We obtain for both constant and decreasing step sizes in the Euler discretization, non-asymptotic bounds for the convergence to stationarity in both total variation and Wasserstein distances. A particular attention is paid on the dependence on the dimension of the state space, to demonstrate the applicability of this method in the high dimensional setting, at least when $U$ is convex. These bounds are based on recently obtained estimates of the convergence of the Langevin diffusion to stationarity using Poincar{\'e} and log-Sobolev inequalities. These bounds improve and extend the results of (Dalalyan, 2014). We also investigate the convergence of an appropriately weighted empirical measure and we report sharp bounds for the mean square error and exponential deviation inequality for Lipschitz functions. A limited Monte Carlo experiment is carried out to support our findings.

...read moreread less

Journal Article•

Stochastic primal-dual coordinate method for regularized empirical risk minimization

[...]

Yuchen Zhang¹, Lin Xiao²•Institutions (2)

Stanford University¹, Microsoft²

01 Jan 2017-Journal of Machine Learning Research

TL;DR: This work proposes a stochastic primal-dual coordinate method, which alternates between maximizing over one (or more) randomly chosen dual variable and minimizing over the primal variable, and develops an extension to non-smooth and nonstrongly convex loss functions.

...read moreread less

Abstract: We consider a generic convex optimization problem associated with regularized empirical risk minimization of linear predictors. The problem structure allows us to reformulate it as a convex-concave saddle point problem. We propose a stochastic primal-dual coordinate (SPDC) method, which alternates between maximizing over a randomly chosen dual variable and minimizing over the primal variables. An extrapolation step on the primal variables is performed to obtain accelerated convergence rate. We also develop a mini-batch version of the SPDC method which facilitates parallel computing, and an extension with weighted sampling probabilities on the dual variables, which has a better complexity than uniform sampling on unnormalized data. Both theoretically and empirically, we show that the SPDC method has comparable or better performance than several state-of-the-art optimization methods.

...read moreread less

Journal Article•DOI•

Linear Convergence and Metric Selection for Douglas-Rachford Splitting and ADMM

[...]

Pontus Giselsson¹, Stephen Boyd²•Institutions (2)

Lund University¹, Stanford University²

01 Feb 2017-IEEE Transactions on Automatic Control

TL;DR: In this article, the convergence rate bound for Douglas-Rachford splitting and ADMM under strong convexity and smoothness assumptions is shown. And the convergence bound is tight for the class of problems under consideration for all feasible algorithm parameters.

...read moreread less

Abstract: Recently, several convergence rate results for Douglas-Rachford splitting and the alternating direction method of multipliers (ADMM) have been presented in the literature. In this paper, we show global linear convergence rate bounds for Douglas-Rachford splitting and ADMM under strong convexity and smoothness assumptions. We further show that the rate bounds are tight for the class of problems under consideration for all feasible algorithm parameters. For problems that satisfy the assumptions, we show how to select step-size and metric for the algorithm that optimize the derived convergence rate bounds. For problems with a similar structure that do not satisfy the assumptions, we present heuristic step-size and metric selection methods.

...read moreread less

Journal Article•DOI•

Semi-Stochastic Gradient Descent Methods

[...]

Jakub Konečný¹, Peter Richtárik¹•Institutions (1)

University of Edinburgh¹

01 May 2017-Frontiers in Applied Mathematics and Statistics

TL;DR: Semi-Stochastic Gradient Descent (S2GD) as mentioned in this paper runs for one or several epochs in each of which a single full gradient and a random number of stochastic gradients is computed, following a geometric law.

...read moreread less

Abstract: In this paper we study the problem of minimizing the average of a large number of smooth convex loss functions. We propose a new method, S2GD (Semi-Stochastic Gradient Descent), which runs for one or several epochs in each of which a single full gradient and a random number of stochastic gradients is computed, following a geometric law. For strongly convex objectives, the method converges linearly. The total work needed for the method to output an epsilon-accurate solution in expectation, measured in the number of passes over data, is proportional to the condition number of the problem and inversely proportional to the number of functions forming the average. This is achieved by running the method with number of stochastic gradient evaluations per epoch proportional to conditioning of the problem. The SVRG method of Johnson and Zhang arises as a special case. To illustrate our theoretical results, S2GD only needs the workload equivalent to about 2.1 full gradient evaluations to find a 10e-6 accurate solution for a problem with 10e9 functions and a condition number of 10e3.

...read moreread less

Journal Article•DOI•

Differentially private average consensus

[...]

Erfan Nozari¹, Pavankumar Tallapragada², Jorge Corts¹•Institutions (2)

University of California, San Diego¹, Indian Institute of Science²

01 Jul 2017-Automatica

TL;DR: In this article, a differentially private Laplacian consensus algorithm was proposed for the multi-agent average consensus problem under the requirement of differential privacy of the agents initial states against an adversary that has access to all the messages.

...read moreread less

Journal Article•DOI•

A Fractional Laplace Equation: Regularity of Solutions and Finite Element Approximations

[...]

Gabriel Acosta¹, Juan Pablo Borthagaray•Institutions (1)

University of Buenos Aires¹

07 Mar 2017-SIAM Journal on Numerical Analysis

TL;DR: In this article, the integral version of the Dirichlet homogeneous fractional Laplace equation is considered and the optimal order of convergence for the standard linear finite element method is proved for quasi-uniform as well as graded meshes.

...read moreread less

Abstract: This paper deals with the integral version of the Dirichlet homogeneous fractional Laplace equation. For this problem weighted and fractional Sobolev a priori estimates are provided in terms of the Holder regularity of the data. By relying on these results, optimal order of convergence for the standard linear finite element method is proved for quasi-uniform as well as graded meshes. Some numerical examples are given showing results in agreement with the theoretical predictions.

...read moreread less

Journal Article•DOI•

Correction of High-Order BDF Convolution Quadrature for Fractional Evolution Equations

[...]

Bangti Jin¹, Buyang Li², Zhi Zhou²•Institutions (2)

University College London¹, Hong Kong Polytechnic University²

21 Dec 2017-SIAM Journal on Scientific Computing

TL;DR: Correct correction formulas at the starting steps of the BDF convolution quadrature for discretizing evolution equations are developed to restore the desired th-order convergence rate.

...read moreread less

Abstract: We develop proper correction formulas at the starting $k-1$ steps to restore the desired $k$th-order convergence rate of the $k$-step BDF convolution quadrature for discretizing evolution equations...

...read moreread less

Journal Article•DOI•

Convergence Rate of Distributed ADMM Over Networks

[...]

Ali Makhdoumi¹, Asuman Ozdaglar¹•Institutions (1)

Massachusetts Institute of Technology¹

03 Mar 2017-IEEE Transactions on Automatic Control

TL;DR: A new distributed algorithm based on alternating direction method of multipliers (ADMM) to minimize sum of locally known convex functions using communication over a network and highlights the effect of network and communication weights on the convergence rate through degrees of the nodes, the smallest nonzero eigenvalue, and operator norm of the communication matrix.

...read moreread less

Abstract: We propose a new distributed algorithm based on alternating direction method of multipliers (ADMM) to minimize sum of locally known convex functions using communication over a network. This optimization problem emerges in many applications in distributed machine learning and statistical estimation. Our algorithm allows for a general choice of the communication weight matrix, which is used to combine the iterates at different nodes. We show that when functions are convex, both the objective function values and the feasibility violation converge with rate $O(1/T)$ , where $T$ is the number of iterations. We then show that when functions are strongly convex and have Lipschitz continuous gradients, the sequence generated by our algorithm converges linearly to the optimal solution. In particular, an $\epsilon$ -optimal solution can be computed with $O\left(\sqrt{\kappa _f} \log (1/\epsilon) \right)$ iterations, where $\kappa _f$ is the condition number of the problem. Our analysis highlights the effect of network and communication weights on the convergence rate through degrees of the nodes, the smallest nonzero eigenvalue, and operator norm of the communication matrix.

...read moreread less

Journal Article•DOI•

Fast Convergence Rates for Distributed Non-Bayesian Learning

[...]

Angelia Nedic¹, Alex Olshevsky², César A. Uribe³•Institutions (3)

Arizona State University¹, Boston University², University of Illinois at Urbana–Champaign³

31 Mar 2017-IEEE Transactions on Automatic Control

TL;DR: In this article, the authors consider the problem of distributed learning where a network of agents collectively aim to agree on a hypothesis that best explains a set of distributed observations of conditionally independent random processes.

...read moreread less

Abstract: We consider the problem of distributed learning , where a network of agents collectively aim to agree on a hypothesis that best explains a set of distributed observations of conditionally independent random processes. We propose a distributed algorithm and establish consistency, as well as a nonasymptotic, explicit, and geometric convergence rate for the concentration of the beliefs around the set of optimal hypotheses. Additionally, if the agents interact over static networks, we provide an improved learning protocol with better scalability with respect to the number of nodes in the network.

...read moreread less

Journal Article•DOI•

Non-negative infrared patch-image model: Robust target-background separation via partial sum minimization of singular values

[...]

Yimian Dai¹, Yiquan Wu¹, Yiquan Wu², Yu Song¹, Jun Guo¹ - Show less +1 more•Institutions (2)

Nanjing University of Aeronautics and Astronautics¹, Southwest Petroleum University²

01 Mar 2017-Infrared Physics & Technology

TL;DR: Considering the fact that the infrared small target is always brighter than its adjacent background, an additional non-negative constraint to the sparse target patch-image is proposed, which could not only wipe off more undesirable components ulteriorly but also accelerate the convergence rate.

...read moreread less

Journal Article•DOI•

A novel linear second order unconditionally energy stable scheme for a hydrodynamic Q -tensor model of liquid crystals

[...]

Jia Zhao¹, Jia Zhao², Xiaofeng Yang², Yuezheng Gong, Qi Wang³, Qi Wang² - Show less +2 more•Institutions (3)

University of North Carolina at Chapel Hill¹, University of South Carolina², Nankai University³

01 May 2017-Computer Methods in Applied Mechanics and Engineering

TL;DR: A novel, linear, second order semi-discrete scheme in time to solve the governing system of equations in the hydrodynamic Q -tensor model, developed following the novel ‘ energy quadratization ’ strategy so that it is linear and unconditionally energy stable at the semi- Discrete level.

...read moreread less

Journal Article•DOI•

Recursive parameter identification of the dynamical models for bilinear state space systems

[...]

Xiao Zhang¹, Feng Ding¹, Feng Ding², Feng Ding³, Fuad E. Alsaadi¹, Fuad E. Alsaadi³, Tasawar Hayat³, Tasawar Hayat⁴, Tasawar Hayat¹ - Show less +5 more•Institutions (4)

Jiangnan University¹, Qingdao University of Science and Technology², King Abdulaziz University³, Quaid-i-Azam University⁴

15 Jun 2017-Nonlinear Dynamics

TL;DR: This paper investigates the recursive parameter and state estimation algorithms for a special class of nonlinear systems (i.e., bilinear state space systems) by using the gradient search and proposes a state observer-based stochastic gradient algorithm and three algorithms derived by means of the multi-innovation theory.

...read moreread less

Abstract: This paper investigates the recursive parameter and state estimation algorithms for a special class of nonlinear systems (i.e., bilinear state space systems). A state observer-based stochastic gradient (O-SG) algorithm is presented for the bilinear state space systems by using the gradient search. In order to improve the parameter estimation accuracy and the convergence rate of the O-SG algorithm, a state observer-based multi-innovation stochastic gradient algorithm and a state observer-based recursive least squares identification algorithm are derived by means of the multi-innovation theory. Finally, a numerical example is provided to demonstrate the effectiveness of the proposed algorithms.

...read moreread less

Proceedings Article•

Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis

[...]

Maxim Raginsky¹, Alexander Rakhlin², Matus Telgarsky¹•Institutions (2)

University of Illinois at Urbana–Champaign¹, University of Pennsylvania²

18 Jun 2017

TL;DR: In this article, a nonasymptotic analysis of stochastic gradient Langevin dynamics (SGLD) is provided for non-convex learning problems, giving finite-time guarantees for SGLD to find approximate minimizers of both empirical and population risks.

...read moreread less

Journal Article•DOI•

A novel non-probabilistic reliability-based design optimization algorithm using enhanced chaos control method

[...]

Peng Hao¹, Yutian Wang¹, Chen Liu¹, Bo Wang¹, Hao Wu - Show less +1 more•Institutions (1)

Dalian University of Technology¹

01 May 2017-Computer Methods in Applied Mechanics and Engineering

TL;DR: In this paper, an efficient and robust algorithm of non-probabilistic reliability-based design optimization (NRBDO) is proposed based on convex model, where the inner loop concerns a Min-max problem for the evaluation of reliability index, and an enhanced chaos control (ECC) method is developed on the basis of chaotic dynamics theory.

...read moreread less

Posted Content•

Momentum and Stochastic Momentum for Stochastic Gradient, Newton, Proximal Point and Subspace Descent Methods

[...]

Nicolas Loizou¹, Peter Richtárik²•Institutions (2)

Université de Montréal¹, King Abdullah University of Science and Technology²

27 Dec 2017-arXiv: Optimization and Control

TL;DR: In this paper, the authors study several classes of stochastic optimization algorithms enriched with heavy ball momentum and prove global nonassymptotic linear convergence rates for all methods and various measures of success.

...read moreread less

Abstract: In this paper we study several classes of stochastic optimization algorithms enriched with heavy ball momentum. Among the methods studied are: stochastic gradient descent, stochastic Newton, stochastic proximal point and stochastic dual subspace ascent. This is the first time momentum variants of several of these methods are studied. We choose to perform our analysis in a setting in which all of the above methods are equivalent. We prove global nonassymptotic linear convergence rates for all methods and various measures of success, including primal function values, primal iterates (in L2 sense), and dual function values. We also show that the primal iterates converge at an accelerated linear rate in the L1 sense. This is the first time a linear rate is shown for the stochastic heavy ball method (i.e., stochastic gradient descent method with momentum). Under somewhat weaker conditions, we establish a sublinear convergence rate for Cesaro averages of primal iterates. Moreover, we propose a novel concept, which we call stochastic momentum, aimed at decreasing the cost of performing the momentum step. We prove linear convergence of several stochastic methods with stochastic momentum, and show that in some sparse data regimes and for sufficiently small momentum parameters, these methods enjoy better overall complexity than methods with deterministic momentum. Finally, we perform extensive numerical testing on artificial and real datasets, including data coming from average consensus problems.

...read moreread less

Journal Article•DOI•

A conservative Fourier pseudo-spectral method for the nonlinear Schrödinger equation

[...]

Yuezheng Gong, Qi Wang¹, Yushun Wang², Jiaxiang Cai•Institutions (2)

Nankai University¹, Nanjing Normal University²

01 Jan 2017-Journal of Computational Physics

TL;DR: A Fourier pseudo-spectral method that conserves mass and energy is developed for a two-dimensional nonlinear Schrodinger equation and it is proved that the optimal rate of convergence is in the order of O in the discrete L 2 norm without any restrictions on the grid ratio.

...read moreread less

Journal Article•DOI•

Extremum Seeking for Static Maps With Delays

[...]

Tiago Roux Oliveira¹, Miroslav Krstic², Daisuke Tsubakino³•Institutions (3)

Rio de Janeiro State University¹, University of California, San Diego², Nagoya University³

01 Apr 2017-IEEE Transactions on Automatic Control

TL;DR: The proposed Newton-based extremum seeking approach removes the dependence of the convergence rate on the unknown Hessian of the nonlinear map to be optimized, being user-assignable as in the literature free of delays.

...read moreread less

Abstract: In this paper, we address the design and analysis of multi-variable extremum seeking for static maps subject to arbitrarily long time delays Both Gradient and Newton-based methods are considered Multi-input systems with different time delays in each individual input channel as well as output delays are dealt with The phase compensation of the dither signals and the inclusion of predictor feedback with a perturbation-based (averaging-based) estimate of the Hessian allow to obtain local exponential convergence results to a small neighborhood of the optimal point, even in the presence of delays The stability analysis is carried out using backstepping transformation and averaging in infinite dimensions, capturing the infinite-dimensional state due the time delay In particular, a new backstepping-like transformation is introduced to design the predictor for the Gradient-based extremum seeking scheme with multiple and distinct input delays The proposed Newton-based extremum seeking approach removes the dependence of the convergence rate on the unknown Hessian of the nonlinear map to be optimized, being user-assignable as in the literature free of delays A source seeking example illustrates the performance of the proposed delay-compensated extremum seeking schemes

...read moreread less

Journal Article•DOI•

Joint state and multi-innovation parameter estimation for time-delay linear systems and its convergence based on the Kalman filtering

[...]

Feng Ding¹, Feng Ding², Xuehai Wang¹, Xuehai Wang³, Li Mao¹, Ling Xu¹ - Show less +2 more•Institutions (3)

Jiangnan University¹, Nanchang Hangkong University², Xinyang Normal University³

01 Mar 2017-Digital Signal Processing

TL;DR: The analysis indicates that the parameter estimates given by the proposed algorithms converge to their true values under the persistent excitation conditions.

...read moreread less

Journal Article•DOI•

Linear Rate Convergence of the Alternating Direction Method of Multipliers for Convex Composite Programming

[...]

Deren Han¹, Defeng Sun², Liwei Zhang³•Institutions (3)

Nanjing Normal University¹, Hong Kong Polytechnic University², Dalian University of Technology³

15 Dec 2017-Mathematics of Operations Research

TL;DR: The linear rate convergence of the alternating direction method of multipliers (ADMM) for solving linearly constrained convex composite optimization problems is proved and the usefulness of the obtained results when applied to two- and multi-block convex quadratic (semidefinite) programming.

...read moreread less

Abstract: In this paper, we aim to prove the linear rate convergence of the alternating direction method of multipliers (ADMM) for solving linearly constrained convex composite optimization problems. Under a mild calmness condition, which holds automatically for convex composite piecewise linear-quadratic programming, we establish the global Q-linear rate of convergence for a general semi-proximal ADMM with the dual step-length being taken in (0, (1+51/2)/2). This semi-proximal ADMM, which covers the classic one, has the advantage to resolve the potentially nonsolvability issue of the subproblems in the classic ADMM and possesses the abilities of handling the multi-block cases efficiently. We demonstrate the usefulness of the obtained results when applied to two- and multi-block convex quadratic (semidefinite) programming.

...read moreread less

Collapse