Showing papers on "Rate of convergence published in 2020"

PDF

Open Access

Journal Article•DOI•

[...]

Konstantinos Zervoudakis¹, Stelios Tsafarakis¹•Institutions (1)

01 Jul 2020-Computers & Industrial Engineering

TL;DR: The processes of nuptial dance and random flight enhance the balance between algorithm’s exploration and exploitation properties and assist its escape from local optima.

...read moreread less

356 citations

Posted Content•

When and why PINNs fail to train: A neural tangent kernel perspective

[...]

Sifan Wang¹, Xinling Yu¹, Paris Perdikaris¹•Institutions (1)

University of Pennsylvania¹

28 Jul 2020-arXiv: Learning

TL;DR: A novel gradient descent algorithm is proposed that utilizes the eigenvalues of the NTK to adaptively calibrate the convergence rate of the total training error and a series of numerical experiments are performed to verify the correctness of the theory and the practical effectiveness of the proposed algorithms.

...read moreread less

Abstract: Physics-informed neural networks (PINNs) have lately received great attention thanks to their flexibility in tackling a wide range of forward and inverse problems involving partial differential equations. However, despite their noticeable empirical success, little is known about how such constrained neural networks behave during their training via gradient descent. More importantly, even less is known about why such models sometimes fail to train at all. In this work, we aim to investigate these questions through the lens of the Neural Tangent Kernel (NTK); a kernel that captures the behavior of fully-connected neural networks in the infinite width limit during training via gradient descent. Specifically, we derive the NTK of PINNs and prove that, under appropriate conditions, it converges to a deterministic kernel that stays constant during training in the infinite-width limit. This allows us to analyze the training dynamics of PINNs through the lens of their limiting NTK and find a remarkable discrepancy in the convergence rate of the different loss components contributing to the total training error. To address this fundamental pathology, we propose a novel gradient descent algorithm that utilizes the eigenvalues of the NTK to adaptively calibrate the convergence rate of the total training error. Finally, we perform a series of numerical experiments to verify the correctness of our theory and the practical effectiveness of the proposed algorithms. The data and code accompanying this manuscript are publicly available at \url{this https URL}.

...read moreread less

308 citations

Proceedings Article•

On the Convergence of FedAvg on Non-IID Data

[...]

Xiang Li¹, Kaixuan Huang², Wenhao Yang², Shusen Wang³, Zhihua Zhang² - Show less +1 more•Institutions (3)

Aberystwyth University¹, Peking University², Stevens Institute of Technology³

30 Apr 2020

TL;DR: In this paper, the authors analyzed the convergence of Federated Averaging on non-iid data and established a convergence rate of O(mathcal{O}(\frac{1}{T}) for strongly convex and smooth problems, where T is the number of SGDs.

...read moreread less

Abstract: Federated learning enables a large amount of edge computing devices to jointly learn a model without data sharing. As a leading algorithm in this setting, Federated Averaging (\texttt{FedAvg}) runs Stochastic Gradient Descent (SGD) in parallel on a small subset of the total devices and averages the sequences only once in a while. Despite its simplicity, it lacks theoretical guarantees under realistic settings. In this paper, we analyze the convergence of \texttt{FedAvg} on non-iid data and establish a convergence rate of $\mathcal{O}(\frac{1}{T})$ for strongly convex and smooth problems, where $T$ is the number of SGDs. Importantly, our bound demonstrates a trade-off between communication-efficiency and convergence rate. As user devices may be disconnected from the server, we relax the assumption of full device participation to partial device participation and study different averaging schemes; low device participation rate can be achieved without severely slowing down the learning. Our results indicate that heterogeneity of data slows down the convergence, which matches empirical observations. Furthermore, we provide a necessary condition for \texttt{FedAvg} on non-iid data: the learning rate $\eta$ must decay, even if full-gradient is used; otherwise, the solution will be $\Omega (\eta)$ away from the optimal.

...read moreread less

307 citations

Journal Article•DOI•

Accelerating Federated Learning via Momentum Gradient Descent

[...]

Wei Liu¹, Li Chen¹, Yunfei Chen², Wenyi Zhang¹•Institutions (2)

University of Science and Technology of China¹, University of Warwick²

01 Aug 2020-IEEE Transactions on Parallel and Distributed Systems

TL;DR: This article considers momentum term which relates to the last iteration of FL, which establishes global convergence properties of MFL and derive an upper bound on MFL convergence rate, and provides conditions in which MFL accelerates the convergence.

...read moreread less

Abstract: Federated learning (FL) provides a communication-efficient approach to solve machine learning problems concerning distributed data, without sending raw data to a central server. However, existing works on FL only utilize first-order gradient descent (GD) and do not consider the preceding iterations to gradient update which can potentially accelerate convergence. In this article, we consider momentum term which relates to the last iteration. The proposed momentum federated learning (MFL) uses momentum gradient descent (MGD) in the local update step of FL system. We establish global convergence properties of MFL and derive an upper bound on MFL convergence rate. Comparing the upper bounds on MFL and FL convergence rates, we provide conditions in which MFL accelerates the convergence. For different machine learning models, the convergence performance of MFL is evaluated based on experiments with MNIST and CIFAR-10 datasets. Simulation results confirm that MFL is globally convergent and further reveal significant convergence improvement over FL.

...read moreread less

220 citations

Journal Article•DOI•

Trajectory Tracking Control of AUVs via Adaptive Fast Nonsingular Integral Terminal Sliding Mode Control

[...]

Lei Qiao¹, Weidong Zhang¹•Institutions (1)

Shanghai Jiao Tong University¹

01 Feb 2020-IEEE Transactions on Industrial Informatics

TL;DR: An adaptive fast nonsingular integral terminal sliding mode control (AFNITSMC) method that provides AUV dynamics a faster convergence rate and its superiority over the existing ANITSMC method is proposed.

...read moreread less

Abstract: This article aims to develop an effective control method that can improve the convergence rate over the existing adaptive nonsingular integral terminal sliding mode control (ANITSMC) method for the trajectory tracking control of autonomous underwater vehicles (AUVs). To achieve this goal, an adaptive fast nonsingular integral terminal sliding mode control (AFNITSMC) method is proposed. First, considering that the existing nonsingular integral terminal sliding mode (NITSM) has slow convergence rate in the region far from the equilibrium point, a fast NITSM (FNITSM) is proposed, which guarantees fast transient convergence both at a distance from and at a close range of the equilibrium point, and therefore increases the convergence rate over the existing NITSM. Then, using this FNITSM and adaptive technique, an AFNITSMC method is designed for AUVs. It yields local finite-time convergence of the velocity tracking errors to zero and then local exponential convergence of the position tracking errors to zero, without requiring any a priori knowledge of the upper bounds of the uncertainties and disturbances. Compared with the existing ANITSMC method, the salient feature of the proposed AFNITSMC method is that it provides AUV dynamics a faster convergence rate. Finally, simulation results demonstrate the efficiency of the proposed AFNITSMC method and its superiority over the existing ANITSMC method.

...read moreread less

212 citations

Proceedings Article•

Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes

[...]

Alekh Agarwal¹, Sham M. Kakade², Jason D. Lee³, Gaurav Mahajan⁴•Institutions (4)

Microsoft¹, University of Washington², Princeton University³, University of California, San Diego⁴

15 Jul 2020

TL;DR: One insight of this work is in formalizing the importance how a favorable initial state distribution provides a means to circumvent worst-case exploration issues, analogous to the global convergence guarantees of iterative value function based algorithms.

...read moreread less

Abstract: Policy gradient (PG) methods are among the most effective methods in challenging reinforcement learning problems with large state and/or action spaces. However, little is known about even their most basic theoretical convergence properties, including: if and how fast they converge to a globally optimal solution (say with a sufficiently rich policy class); how they cope with approximation error due to using a restricted class of parametric policies; or their finite sample behavior. Such characterizations are important not only to compare these methods to their approximate value function counterparts (where such issues are relatively well understood, at least in the worst case), but also to help with more principled approaches to algorithm design. This work provides provable characterizations of computational, approximation, and sample size issues with regards to policy gradient methods in the context of discounted Markov Decision Processes (MDPs). We focus on both: 1) ``tabular'' policy parameterizations, where the optimal policy is contained in the class and where we show global convergence to the optimal policy, and 2) restricted policy classes, which may not contain the optimal policy and where we provide agnostic learning results. In the emph{tabular setting}, our main results are: 1) convergence rate to global optimum for direct parameterization and projected gradient ascent 2) an asymptotic convergence to global optimum for softmax policy parameterization and PG; and a convergence rate with additional entropy regularization, and 3) dimension-free convergence to global optimum for softmax policy parameterization and Natural Policy Gradient (NPG) method with exact gradients. In emph{function approximation}, we further analyze NPG with exact as well as inexact gradients under certain smoothness assumptions on the policy parameterization and establish rates of convergence in terms of the quality of the initial state distribution. One insight of this work is in formalizing how a favorable initial state distribution provides a means to circumvent worst-case exploration issues. Overall, these results place PG methods under a solid theoretical footing, analogous to the global convergence guarantees of iterative value function based algorithms.

...read moreread less

198 citations

Book•

Quantitative Stochastic Homogenization and Large-Scale Regularity

[...]

Scott N. Armstrong, Tuomo Kuusi¹, Jean-Christophe Mourrat•Institutions (1)

Aalto University¹

13 Jun 2020

TL;DR: In this article, the authors present a preliminary version of a book which presents the quantitative homogenization and large-scale regularity theory for elliptic equations in divergence-form.

...read moreread less

Abstract: This is a preliminary version of a book which presents the quantitative homogenization and large-scale regularity theory for elliptic equations in divergence-form The self-contained presentation gives new and simplified proofs of the core results proved in the last several years, including the algebraic convergence rate for the variational subadditive quantities, the large-scale Lipschitz and higher regularity estimates and Liouville-type results, optimal quantitative estimates on the first-order correctors and their scaling limit to a Gaussian free field The last chapter contains new results on the homogenization of the Dirichlet problem, including optimal quantitative estimates of the homogenization error and the two-scale expansion

...read moreread less

183 citations

Journal Article•DOI•

Position-Transitional Particle Swarm Optimization-incorporated Latent Factor Analysis

[...]

Xin Luo¹, Ye Yuan¹, Sili Chen¹, Nianyin Zeng², Zidong Wang³ - Show less +1 more•Institutions (3)

Chinese Academy of Sciences¹, Xiamen University², Brunel University London³

23 Oct 2020-IEEE Transactions on Knowledge and Data Engineering

TL;DR: Zhang et al. as discussed by the authors investigated the evolution process of a particle swarm optimization algorithm with care, and then proposed to incorporate more dynamic information into it for avoiding accuracy loss caused by premature convergence without extra computation burden.

...read moreread less

Abstract: High-dimensional and sparse (HiDS) matrices are frequently found in various industrial applications. A latent factor analysis (LFA) model is commonly adopted to extract useful knowledge from an HiDS matrix, whose parameter training mostly relies on a stochastic gradient descent (SGD) algorithm. However, an SGD-based LFA model's learning rate is hard to tune in real applications, making it vital to implement its self-adaptation. To address this critical issue, this study firstly investigates the evolution process of a particle swarm optimization algorithm with care, and then proposes to incorporate more dynamic information into it for avoiding accuracy loss caused by premature convergence without extra computation burden, thereby innovatively achieving a novel position-transitional particle swarm optimization (P2SO) algorithm. It is subsequently adopted to implement a P2SO-based LFA (PLFA) model that builds a learning rate swarm applied to the same group of LFs. Thus, a PLFA model implements highly efficient learning rate adaptation as well as represents an HiDS matrix precisely. Experimental results on four HiDS matrices emerging from real applications demonstrate that compared with an SGD-based LFA model, a PLFA model no longer suffers from a tedious and expensive tuning process of its learning rate to achieve higher prediction accuracy for missing data.

...read moreread less

169 citations

Proceedings Article•

On the Global Convergence Rates of Softmax Policy Gradient Methods

[...]

Jincheng Mei¹, Chenjun Xiao¹, Csaba Szepesvári¹, Dale Schuurmans¹•Institutions (1)

University of Alberta¹

12 Jul 2020

TL;DR: It is shown that with the true gradient, policy gradient with a softmax parametrization converges at a $O(1/t)$ rate, with constants depending on the problem and initialization, which significantly expands the recent asymptotic convergence results.

...read moreread less

Abstract: We make three contributions toward better understanding policy gradient methods in the tabular setting. First, we show that with the true gradient, policy gradient with a softmax parametrization converges at a $O(1/t)$ rate, with constants depending on the problem and initialization. This result significantly expands the recent asymptotic convergence results. The analysis relies on two findings: that the softmax policy gradient satisfies a Łojasiewicz inequality, and the minimum probability of an optimal action during optimization can be bounded in terms of its initial value. Second, we analyze entropy regularized policy gradient and show that it enjoys a significantly faster linear convergence rate $O(e^{-t})$ toward softmax optimal policy. This result resolves an open question in the recent literature. Finally, combining the above two results and additional new $\Omega(1/t)$ lower bound results, we explain how entropy regularization improves policy optimization, even with the true gradient, from the perspective of convergence rate. The separation of rates is further explained using the notion of non-uniform Łojasiewicz degree. These results provide a theoretical understanding of the impact of entropy and corroborate existing empirical studies.

...read moreread less

160 citations

Journal Article•DOI•

Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods

[...]

Nicolas Loizou¹, Peter Richtárik²•Institutions (2)

Université de Montréal¹, King Abdullah University of Science and Technology²

01 Dec 2020-Computational Optimization and Applications

TL;DR: A novel concept, which is called stochastic momentum, aimed at decreasing the cost of performing the momentum step is proposed, and it is proved that in some sparse data regimes and for sufficiently small momentum parameters, these methods enjoy better overall complexity than methods with deterministic momentum.

...read moreread less

Abstract: In this paper we study several classes of stochastic optimization algorithms enriched with heavy ball momentum. Among the methods studied are: stochastic gradient descent, stochastic Newton, stochastic proximal point and stochastic dual subspace ascent. This is the first time momentum variants of several of these methods are studied. We choose to perform our analysis in a setting in which all of the above methods are equivalent: convex quadratic problems. We prove global non-asymptotic linear convergence rates for all methods and various measures of success, including primal function values, primal iterates, and dual function values. We also show that the primal iterates converge at an accelerated linear rate in a somewhat weaker sense. This is the first time a linear rate is shown for the stochastic heavy ball method (i.e., stochastic gradient descent method with momentum). Under somewhat weaker conditions, we establish a sublinear convergence rate for Cesaro averages of primal iterates. Moreover, we propose a novel concept, which we call stochastic momentum, aimed at decreasing the cost of performing the momentum step. We prove linear convergence of several stochastic methods with stochastic momentum, and show that in some sparse data regimes and for sufficiently small momentum parameters, these methods enjoy better overall complexity than methods with deterministic momentum. Finally, we perform extensive numerical testing on artificial and real datasets, including data coming from average consensus problems.

...read moreread less

121 citations

Posted Content•

Finite-Sample Analysis of Proximal Gradient TD Algorithms

[...]

Bo Liu¹, Ji Liu², Mohammad Ghavamzadeh³, Sridhar Mahadevan¹, Marek Petrik⁴ - Show less +1 more•Institutions (4)

University of Massachusetts Amherst¹, University of Rochester², French Institute for Research in Computer Science and Automation³, IBM⁴

06 Jun 2020-arXiv: Learning

TL;DR: Theoretical analysis of gradient TD (GTD) reinforcement learning methods implies that the GTD family of algorithms are comparable and may indeed be preferred over existing least squares TD methods for off-policy learning, due to their linear complexity.

...read moreread less

Abstract: In this paper, we analyze the convergence rate of the gradient temporal difference learning (GTD) family of algorithms Previous analyses of this class of algorithms use ODE techniques to prove asymptotic convergence, and to the best of our knowledge, no finite-sample analysis has been done Moreover, there has been not much work on finite-sample analysis for convergent off-policy reinforcement learning algorithms In this paper, we formulate GTD methods as stochastic gradient algorithms wrt~a primal-dual saddle-point objective function, and then conduct a saddle-point error analysis to obtain finite-sample bounds on their performance Two revised algorithms are also proposed, namely projected GTD2 and GTD2-MP, which offer improved convergence guarantees and acceleration, respectively The results of our theoretical analysis show that the GTD family of algorithms are indeed comparable to the existing LSTD methods in off-policy learning scenarios

...read moreread less

Journal Article•DOI•

Robust fixed-time attitude stabilization control of flexible spacecraft with actuator uncertainty

[...]

Lu Cao¹, Bing Xiao², Mehdi Golestani³•Institutions (3)

Academy of Military Science¹, Northwestern Polytechnical University², Islamic Azad University³

01 May 2020-Nonlinear Dynamics

TL;DR: The closed-loop attitude stabilization system is proved to be fixed-time stable with the convergence time independent of initial states and the attitude stabilization performance is robust to disturbance and uncertainties in inertia and actuators.

...read moreread less

Abstract: A robust fixed-time control framework is presented to stabilize flexible spacecraft’s attitude system with external disturbance, uncertain parameters of inertia, and actuator uncertainty. As a stepping stone, a nonlinear system having faster fixed-time convergence property is preliminarily proposed by introducing a time-varying gain into the conventional fixed-time stability method. This gain improves the convergence rate. Then, a fixed-time observer is proposed to estimate the uncertain torque induced by disturbance, uncertain parameters of inertia, and actuator uncertainty. Fixed-time stability is ensured for the estimation error. Using this estimated knowledge and the full-states’ measurements, a nonsingular terminal sliding controller is finally synthesized. This is achieved via a nonsingular and faster terminal sliding surface with faster convergence rate. The closed-loop attitude stabilization system is proved to be fixed-time stable with the convergence time independent of initial states. The attitude stabilization performance is robust to disturbance and uncertainties in inertia and actuators. Simulation results are also shown to validate the attitude stabilization performance of this control approach.

...read moreread less

Journal Article•DOI•

Stochastic Gradient Descent for Nonconvex Learning Without Bounded Gradient Assumptions

[...]

Yunwen Lei¹, Ting Hu², Guiying Li¹, Ke Tang¹•Institutions (2)

Southern University of Science and Technology¹, Wuhan University²

01 Oct 2020-IEEE Transactions on Neural Networks

TL;DR: This article establishes a rigorous theoretical foundation for SGD in nonconvex learning by showing that this boundedness assumption can be removed without affecting convergence rates, and relaxing the standard smoothness assumption to Hölder continuity of gradients.

...read moreread less

Abstract: Stochastic gradient descent (SGD) is a popular and efficient method with wide applications in training deep neural nets and other nonconvex models. While the behavior of SGD is well understood in the convex learning setting, the existing theoretical results for SGD applied to nonconvex objective functions are far from mature. For example, existing results require to impose a nontrivial assumption on the uniform boundedness of gradients for all iterates encountered in the learning process, which is hard to verify in practical implementations. In this article, we establish a rigorous theoretical foundation for SGD in nonconvex learning by showing that this boundedness assumption can be removed without affecting convergence rates, and relaxing the standard smoothness assumption to Holder continuity of gradients. In particular, we establish sufficient conditions for almost sure convergence as well as optimal convergence rates for SGD applied to both general nonconvex and gradient-dominated objective functions. A linear convergence is further derived in the case with zero variances.

...read moreread less

Journal Article•DOI•

Variance-Reduced Decentralized Stochastic Optimization With Accelerated Convergence

[...]

Ran Xin¹, Usman A. Khan², Soummya Kar¹•Institutions (2)

Carnegie Mellon University¹, Tufts University²

15 Oct 2020-IEEE Transactions on Signal Processing

TL;DR: The GT-VR framework as discussed by the authors is a stochastic and decentralized framework to minimize a finite-sum of functions available over a network of nodes, which is particularly suitable for problems where large-scale, potentially private data, cannot be collected or processed at a centralized server.

...read moreread less

Abstract: This paper describes a novel algorithmic framework to minimize a finite-sum of functions available over a network of nodes. The proposed framework, that we call GT-VR , is stochastic and decentralized, and thus is particularly suitable for problems where large-scale, potentially private data, cannot be collected or processed at a centralized server. The GT-VR framework leads to a family of algorithms with two key ingredients: (i) local variance reduction , that enables estimating the local batch gradients from arbitrarily drawn samples of local data; and, (ii) global gradient tracking , which fuses the gradient information across the nodes. Naturally, combining different variance reduction and gradient tracking techniques leads to different algorithms of interest with valuable practical tradeoffs and design considerations. Our focus in this paper is on two instantiations of the ${\bf \mathtt {GT-VR}}$ framework, namely GT-SAGA and GT-SVRG , that, similar to their centralized counterparts ( SAGA and SVRG ), exhibit a compromise between space and time. We show that both GT-SAGA and GT-SVRG achieve accelerated linear convergence for smooth and strongly convex problems and further describe the regimes in which they achieve non-asymptotic, network-independent linear convergence rates that are faster with respect to the existing decentralized first-order schemes. Moreover, we show that both algorithms achieve a linear speedup in such regimes compared to their centralized counterparts that process all data at a single node. Extensive simulations illustrate the convergence behavior of the corresponding algorithms.

...read moreread less

Journal Article•DOI•

The forward–backward–forward method from continuous and discrete perspective for pseudo-monotone variational inequalities in Hilbert spaces

[...]

Radu Ioan Boţ¹, Ernö Robert Csetnek¹, Phan Tu Vuong², Phan Tu Vuong¹•Institutions (2)

University of Vienna¹, University of Southampton²

16 Nov 2020-European Journal of Operational Research

TL;DR: This paper associates to a pseudo-monotone variational inequality a forward-backward-forward dynamical system and carries out an asymptotic analysis for the generated trajectories and proves that linear convergence is guaranteed under strong pseudo- monotonicity.

...read moreread less

Journal Article•DOI•

Semi-implicit Galerkin–Legendre Spectral Schemes for Nonlinear Time-Space Fractional Diffusion–Reaction Equations with Smooth and Nonsmooth Solutions

[...]

Mahmoud A. Zaky, Ahmed S. Hendy¹, Ahmed S. Hendy², Jorge Eduardo Macías-Díaz³•Institutions (3)

Ural Federal University¹, Banha University², Autonomous University of Aguascalientes³

01 Jan 2020-Journal of Scientific Computing

TL;DR: The governing partial differential equation generalizes the Hodgkin–Huxley, the Allen–Cahn and the Fisher–Kolmogorov–Petrovskii–Piscounov equations, and guarantees the unconditional stability.

...read moreread less

Abstract: For the first time in literature, semi-implicit spectral approximations for nonlinear Caputo time- and Riesz space-fractional diffusion equations with both smooth and non-smooth solutions are proposed. More precisely, the governing partial differential equation generalizes the Hodgkin–Huxley, the Allen–Cahn and the Fisher–Kolmogorov–Petrovskii–Piscounov equations. The schemes employ a Legendre-based Galerkin spectral method for the Riesz space-fractional derivative, and L1-type approximations with both uniform and graded meshes for the Caputo time-fractional derivative. More importantly, by using fractional Gronwall inequalities and their associated discrete forms, sharp error estimates are proved which show an enhancement in the convergence rate compared with the standard L1 approximation on uniform meshes. This analysis encompasses both uniform meshes as well as meshes that are graded in time, and guarantees the unconditional stability. The numerical results that accompany our analysis confirm our theoretical error estimates, and give significant insights into the convergence behavior of our schemes for problems with smooth and non-smooth solutions.

...read moreread less

Journal Article•DOI•

On exact solutions for time-fractional Korteweg-de Vries and Korteweg-de Vries-Burger’s equations using homotopy analysis transform method

[...]

Khaled M. Saad¹, Khaled M. Saad², Eman. H.F. AL-Shareef², A. K. Alomari³, Dumitru Baleanu⁴, José Francisco Gómez-Aguilar - Show less +2 more•Institutions (4)

Taiz University¹, Najran University², Yarmouk University³, Çankaya University⁴

01 Feb 2020-Chinese Journal of Physics

TL;DR: In this paper, the homotopy analysis transform method (HATM) is used to solve the time fractional order Korteweg-de Vries (KdV) and Kortheweg de Vries-Burger's (kdVB) equations.

...read moreread less

Journal Article•DOI•

First-order optimization algorithms via inertial systems with Hessian driven damping

[...]

Hedy Attouch¹, Zaki Chbani², Jalal M. Fadili¹, Hassan Riahi²•Institutions (2)

Centre national de la recherche scientifique¹, Cadi Ayyad University²

16 Nov 2020-Mathematical Programming

TL;DR: On the basis of a regularization technique using the Moreau envelope, a class of first-order algorithms involving inertial features involving both viscous and Hessian-driven dampings are extended to non-smooth convex functions with extended real values.

...read moreread less

Abstract: In a Hilbert space setting, for convex optimization, we analyze the convergence rate of a class of first-order algorithms involving inertial features. They can be interpreted as discrete time versions of inertial dynamics involving both viscous and Hessian-driven dampings. The geometrical damping driven by the Hessian intervenes in the dynamics in the form $$ abla ^2 f (x(t)) \dot{x} (t)$$ . By treating this term as the time derivative of $$ abla f (x (t)) $$ , this gives, in discretized form, first-order algorithms in time and space. In addition to the convergence properties attached to Nesterov-type accelerated gradient methods, the algorithms thus obtained are new and show a rapid convergence towards zero of the gradients. On the basis of a regularization technique using the Moreau envelope, we extend these methods to non-smooth convex functions with extended real values. The introduction of time scale factors makes it possible to further accelerate these algorithms. We also report numerical results on structured problems to support our theoretical findings.

...read moreread less

Journal Article•DOI•

Modified Kalman filtering based multi-step-length gradient iterative algorithm for ARX models with random missing outputs

[...]

Jing Chen¹, Quanmin Zhu², Yanjun Liu¹•Institutions (2)

Jiangnan University¹, University of the West of England²

01 Aug 2020-Automatica

TL;DR: This study improves the estimation accuracy of the missing outputs of the ARX models with missing outputs by introducing a modified Kalman filter, and the parameter estimation convergence rate by deriving a new multi-step-length formulation.

...read moreread less

Journal Article•DOI•

Numerical solution of fractional elliptic stochastic PDEs with spatial white noise

[...]

David Bolin¹, Kristin Kirchner¹, Mihály Kovács¹•Institutions (1)

Chalmers University of Technology¹

24 Apr 2020-Ima Journal of Numerical Analysis

TL;DR: In this paper, the numerical approximation of solutions to stochastic partial differential equations with additive spatial white noise on bounded domains in R-d is considered, where the differential operator is given by the fractional power L-beta, beta is an element of (0, 1) of an integer-order elliptic differential operator L and is therefore nonlocal.

...read moreread less

Abstract: The numerical approximation of solutions to stochastic partial differential equations with additive spatial white noise on bounded domains in R-d is considered. The differential operator is given by the fractional power L-beta, beta is an element of (0, 1) of an integer-order elliptic differential operator L and is therefore nonlocal. Its inverse L-beta is represented by a Bochner integral from the Dunford-Taylor functional calculus. By applying a quadrature formula to this integral representation the inverse fractional-order operator L-beta is approximated by a weighted sum of nonfractional resolvents (I + exp(2yl)L)(-1) at certain quadrature nodes t(j) > 0. The resolvents are then discretized in space by a standard finite element method. This approach is combined with an approximation of the white noise, which is based only on the mass matrix of the finite element discretization. In this way an efficient numerical algorithm for computing samples of the approximate solution is obtained. For the resulting approximation the strong mean-square error is analyzed and an explicit rate of convergence is derived. Numerical experiments for L = kappa(2) - Delta, kappa > 0 with homogeneous Dirichlet boundary conditions on the unit cube (0, 1)(d) in d = 1, 2, 3 spatial dimensions for varying beta is an element of (0, 1) attest to the theoretical results.

...read moreread less

Journal Article•DOI•

Accelerated Convergence Algorithm for Distributed Constrained Optimization under Time-Varying General Directed Graphs

[...]

Huaqing Li¹, Qingguo Lü¹, Xiaofeng Liao¹, Tingwen Huang²•Institutions (2)

Southwest University¹, Texas A&M University at Qatar²

01 Jul 2020-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: This paper studies a class of distributed convex optimization problems by a set of agents in which each agent only has access to its own local convex objective function and the estimate of each agent is restricted to both coupling linear constraint and individual box constraints.

...read moreread less

Abstract: This paper studies a class of distributed convex optimization problems by a set of agents in which each agent only has access to its own local convex objective function and the estimate of each agent is restricted to both coupling linear constraint and individual box constraints. Our focus is to devise a distributed primal-dual gradient algorithm for working out the problem over a sequence of time-varying general directed graphs. The communications among agents are assumed to be uniformly strongly connected. A column-stochastic mixing matrix and a fixed step-size are applied in the algorithm which exactly steers all the agents to asymptotically converge to a global optimal solution. Based on the standard strong convexity and the smoothness assumptions of the objective functions, we show that the distributed algorithm is capable of driving the whole network to geometrically converge to an optimal solution of the convex optimization problem only if the step-size does not exceed some upper bound. We also give an explicit analysis for the convergence rate of the proposed optimization algorithm. Simulations on economic dispatch problems and demand response problems in power systems are performed to illustrate the effectiveness of the proposed optimization algorithm.

...read moreread less

Journal Article•DOI•

Achieving Linear Convergence in Distributed Asynchronous Multiagent Optimization

[...]

Ye Tian¹, Ying Sun¹, Gesualdo Scutari¹•Institutions (1)

Purdue University¹

03 Mar 2020-IEEE Transactions on Automatic Control

TL;DR: This article proposes a general distributed asynchronous algorithmic framework whereby agents can update their local variables as well as communicate with their neighbors at any time, without any form of coordination, and proves that this is the first distributed algorithm with provable geometric convergence rate in such a general asynchronous setting.

...read moreread less

Abstract: This article studies multiagent (convex and nonconvex ) optimization over static digraphs. We propose a general distributed asynchronous algorithmic framework whereby 1) agents can update their local variables as well as communicate with their neighbors at any time, without any form of coordination; and 2) they can perform their local computations using (possibly) delayed, out-of-sync information from the other agents. Delays need not be known to the agent or obey any specific profile, and can also be time-varying (but bounded). The algorithm builds on a tracking mechanism that is robust against asynchrony (in the above sense), whose goal is to estimate locally the average of agents’ gradients. When applied to strongly convex functions, we prove that it converges at an R-linear (geometric) rate as long as the step-size is sufficiently small. A sublinear convergence rate is proved, when nonconvex problems and/or diminishing, uncoordinated step-sizes are considered. To the best of our knowledge, this is the first distributed algorithm with provable geometric convergence rate in such a general asynchronous setting. Preliminary numerical results demonstrate the efficacy of the proposed algorithm and validate our theoretical findings.

...read moreread less

Journal Article•DOI•

Convergence rates of variational posterior distributions

[...]

Fengshuo Zhang, Chao Gao

01 Aug 2020-Annals of Statistics

TL;DR: For a class of priors that admit the structure of a mixture of product measures, the authors proposed a prior mass condition, under which the variational approximation error of the mean-field class is dominated by convergence rate of the true posterior.

...read moreread less

Abstract: We study convergence rates of variational posterior distributions for nonparametric and high-dimensional inference. We formulate general conditions on prior, likelihood and variational class that characterize the convergence rates. Under similar “prior mass and testing” conditions considered in the literature, the rate is found to be the sum of two terms. The first term stands for the convergence rate of the true posterior distribution, and the second term is contributed by the variational approximation error. For a class of priors that admit the structure of a mixture of product measures, we propose a novel prior mass condition, under which the variational approximation error of the mean-field class is dominated by convergence rate of the true posterior. We demonstrate the applicability of our general results for various models, prior distributions and variational classes by deriving convergence rates of the corresponding variational posteriors.

...read moreread less

Journal Article•DOI•

A parallel-in-time iterative algorithm for Volterra partial integro-differential problems with weakly singular kernel

[...]

Xian-Ming Gu¹, Shu-Lin Wu²•Institutions (2)

Southwestern University of Finance and Economics¹, Northeast Normal University²

15 Sep 2020-Journal of Computational Physics

TL;DR: This paper considers a class of representative problems and proposes a novel iterative algorithm for PinT computation that can solve the PDEs for all the discrete time points simultaneously via the diagonalization technique proposed recently.

...read moreread less

Journal Article•DOI•

Approximation of functions by Stancu variant of Bernstein–Kantorovich operators based on shape parameter $${\varvec{\alpha }}$$α

[...]

Syed Abdul Mohiuddine¹, Faruk Özger²•Institutions (2)

King Abdulaziz University¹, Izmir Kâtip Çelebi University²

01 Apr 2020-Revista De La Real Academia De Ciencias Exactas Fisicas Y Naturales Serie A-matematicas

TL;DR: In this article, the Stancu variant of Bernstein-Kantorovich operators based on shape parameter $$\alpha $$ was constructed and the rate of convergence of these operators by means of suitable modulus of continuity to any continuous functions f(x) on $$x\in [0, 1]$$ and Voronovskaja-type approximation theorem was investigated.

...read moreread less

Abstract: We construct the Stancu variant of Bernstein–Kantorovich operators based on shape parameter $$\alpha $$. We investigate the rate of convergence of these operators by means of suitable modulus of continuity to any continuous functions f(x) on $$x\in [0,1]$$ and Voronovskaja-type approximation theorem. Moreover, we study other approximation properties of our new operators such as weighted approximation as well as pointwise convergence. Finally, some illustrative graphics are provided here by our new Stancu-type Bernstein–Kantorovich operators in order to demonstrate the significance of our operators.

...read moreread less

Journal Article•DOI•

Concentration of tempered posteriors and of their variational approximations

[...]

Pierre Alquier, James Ridgway

01 Jun 2020-Annals of Statistics

TL;DR: A general approach to prove the concentration of variational approximations of fractional posteriors of matrix completion and Gaussian VB is proposed.

...read moreread less

Abstract: While Bayesian methods are extremely popular in statistics and machine learning, their application to massive data sets is often challenging, when possible at all. The classical MCMC algorithms are prohibitively slow when both the model dimension and the sample size are large. Variational Bayesian methods aim at approximating the posterior by a distribution in a tractable family $\mathcal{F}$. Thus, MCMC are replaced by an optimization algorithm which is orders of magnitude faster. VB methods have been applied in such computationally demanding applications as collaborative filtering, image and video processing or NLP to name a few. However, despite nice results in practice, the theoretical properties of these approximations are not known. We propose a general oracle inequality that relates the quality of the VB approximation to the prior $\pi $ and to the structure of $\mathcal{F}$. We provide a simple condition that allows to derive rates of convergence from this oracle inequality. We apply our theory to various examples. First, we show that for parametric models with log-Lipschitz likelihood, Gaussian VB leads to efficient algorithms and consistent estimators. We then study a high-dimensional example: matrix completion, and a nonparametric example: density estimation.

...read moreread less

Journal Article•DOI•

An efficient equilibrium optimizer with mutation strategy for numerical optimization

[...]

Shubham Gupta¹, Kusum Deep¹, Seyedali Mirjalili•Institutions (1)

Indian Institute of Technology Roorkee¹

01 Nov 2020

TL;DR: Experimental results and comparison demonstrate that the proposed Modified Equilibrium Optimizer can be considered a better metaheuristic optimization approach than other compared algorithms.

...read moreread less

Abstract: To alleviate the shortcomings of the standard Equilibrium Optimizer, a new improved algorithm called Modified Equilibrium Optimizer is proposed in this work. This algorithm utilizes the Gaussian mutation and an additional exploratory search mechanism based on the concept of population division and reconstruction. The population in each iteration of the proposed algorithm is constructed using these mechanisms and standard search procedure of the Equilibrium Optimizer. These strategies attempt to maintain the diversity of solutions during the search, so that the tendency of stagnation towards the sub-optimal solutions can be avoided and the convergence rate can be boosted to obtain more accurate optimal solutions. To validate and analyze the performance of the Modified Equilibrium Optimizer, a collection of 33 benchmark problems and four engineering design problems are adopted. Later, in the paper, the Modified Equilibrium Optimizer has been used to train multilayer perceptrons. The experimental results and comparison based on several metrics such as statistical analysis, scalability test, diversity analysis, performance index analysis and convergence analysis demonstrate that the proposed algorithm can be considered a better metaheuristic optimization approach than other compared algorithms.

...read moreread less

Journal Article•DOI•

Quantized Data Driven Iterative Learning Control for a Class of Nonlinear Systems With Sensor Saturation

[...]

Xuhui Bu, Zhongsheng Hou¹, Qiongxia Yu, Yi Yang•Institutions (1)

Beijing Jiaotong University¹

01 Dec 2020-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: This paper considers the problem of data driven iterative learning control (DDILC) for a class of nonaffine nonlinear systems subject to data quantization and sensor saturation and proposes two novel quantized DDILC algorithms based on saturated and quantized information of system outputs.

...read moreread less

Abstract: This paper considers the problem of data driven iterative learning control (DDILC) for a class of nonaffine nonlinear systems subject to data quantization and sensor saturation. Two novel quantized DDILC (QDDILC) algorithms are proposed based on saturated and quantized information of system outputs. The convergence of the proposed QDDILC algorithms is strictly proved and the effects of output saturation and data quantification are also analyzed. It is shown that sensor saturation does not change the convergence property, thus it causes the convergence rate to slow down. For the QDDILC algorithm, data quantization will cause the tracking error to converge to a bound depending on the quantization level. However, the modified QDDILC algorithm, which using the different quantization scheme from QDDILC algorithm, can ensure that the tracking error converges to zero. Illustrative simulations are exploited to verify the theoretical results.

...read moreread less

Proceedings Article•

Random Reshuffling: Simple Analysis with Vast Improvements

[...]

Konstantin Mishchenko¹, Ahmed Khaled¹, Peter Richtárik¹•Institutions (1)

King Abdullah University of Science and Technology¹

10 Jun 2020

TL;DR: The theory for strongly-convex objectives tightly matches the known lower bounds for both RR and SO and substantiates the common practical heuristic of shuffling once or only a few times and proves fast convergence of the Shuffle-Once algorithm, which shuffles the data only once.

...read moreread less

Abstract: Random Reshuffling (RR) is an algorithm for minimizing finite-sum functions that utilizes iterative gradient descent steps in conjunction with data reshuffling. Often contrasted with its sibling Stochastic Gradient Descent (SGD), RR is usually faster in practice and enjoys significant popularity in convex and non-convex optimization. The convergence rate of RR has attracted substantial attention recently and, for strongly convex and smooth functions, it was shown to converge faster than SGD if 1) the stepsize is small, 2) the gradients are bounded, and 3) the number of epochs is large. We remove these 3 assumptions, improve the dependence on the condition number from $\kappa^2$ to $\kappa$ (resp. from $\kappa$ to $\sqrt{\kappa}$) and, in addition, show that RR has a different type of variance. We argue through theory and experiments that the new variance type gives an additional justification of the superior performance of RR. To go beyond strong convexity, we present several results for non-strongly convex and non-convex objectives. We show that in all cases, our theory improves upon existing literature. Finally, we prove fast convergence of the Shuffle-Once (SO) algorithm, which shuffles the data only once, at the beginning of the optimization process. Our theory for strongly-convex objectives tightly matches the known lower bounds for both RR and SO and substantiates the common practical heuristic of shuffling once or only a few times. As a byproduct of our analysis, we also get new results for the Incremental Gradient algorithm (IG), which does not shuffle the data at all.

...read moreread less

Journal Article•DOI•

Convergence rate of inertial Forward–Backward algorithm beyond Nesterov’s rule

[...]

Vassilis Apidopoulos, Jean-François Aujol, Charles Dossal

01 Mar 2020-Mathematical Programming

TL;DR: This paper shows that for a sequence of over-relaxation parameters, that do not satisfy Nesterov’s rule, one can still expect some relatively fast convergence properties for the objective function.

...read moreread less

Abstract: In this paper we study the convergence of an Inertial Forward-Backward algorithm, with a particular choice of an over-relaxation term. In particular we show that for a sequence of overrrelaxation parameters, that do not satisfy Nesterov’s rule one can still expect some relatively fast convergence properties for the objective function. In addition we complement this work by studying the convergence of the algorithm in the case where the proximal operator is inexactly computed with the presence of some errors and we give sufficient conditions over these errors in order to obtain some convergence properties for the objective function .

...read moreread less

Collapse