scispace - formally typeset
Search or ask a question

Showing papers on "Rate of convergence published in 2021"


Journal ArticleDOI
TL;DR: The results suggest that the accuracy of NSFnets, for both laminar and turbulent flows, can be improved with proper tuning of weights (manual or dynamic) in the loss function.

303 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed FEDL, a federated learning algorithm that can handle heterogeneous UE data without further assumptions except strongly convex and smooth loss functions and provided a convergence rate characterizing the trade-off between local computation rounds of each UE to update its local model and global communication rounds to update the FL global model.
Abstract: There is an increasing interest in a fast-growing machine learning technique called Federated Learning (FL), in which the model training is distributed over mobile user equipment (UEs), exploiting UEs’ local computation and training data. Despite its advantages such as preserving data privacy, FL still has challenges of heterogeneity across UEs’ data and physical resources. To address these challenges, we first propose FEDL , a FL algorithm which can handle heterogeneous UE data without further assumptions except strongly convex and smooth loss functions. We provide a convergence rate characterizing the trade-off between local computation rounds of each UE to update its local model and global communication rounds to update the FL global model. We then employ FEDL in wireless networks as a resource allocation optimization problem that captures the trade-off between FEDL convergence wall clock time and energy consumption of UEs with heterogeneous computing and power resources. Even though the wireless resource allocation problem of FEDL is non-convex, we exploit this problem’s structure to decompose it into three sub-problems and analyze their closed-form solutions as well as insights into problem design. Finally, we empirically evaluate the convergence of FEDL with PyTorch experiments, and provide extensive numerical results for the wireless resource allocation sub-problems. Experimental results show that FEDL outperforms the vanilla FedAvg algorithm in terms of convergence rate and test accuracy in various settings.

193 citations


Journal ArticleDOI
TL;DR: A novel particle swarm optimization (PSO) algorithm is put forward where a sigmoid-function-based weighting strategy is developed to adaptively adjust the acceleration coefficients, inspired by the activation function of neural networks.
Abstract: In this paper, a novel particle swarm optimization (PSO) algorithm is put forward where a sigmoid-function-based weighting strategy is developed to adaptively adjust the acceleration coefficients. The newly proposed adaptive weighting strategy takes into account both the distances from the particle to the global best position and from the particle to its personal best position, thereby having the distinguishing feature of enhancing the convergence rate. Inspired by the activation function of neural networks, the new strategy is employed to update the acceleration coefficients by using the sigmoid function. The search capability of the developed adaptive weighting PSO (AWPSO) algorithm is comprehensively evaluated via eight well-known benchmark functions including both the unimodal and multimodal cases. The experimental results demonstrate that the designed AWPSO algorithm substantially improves the convergence rate of the particle swarm optimizer and also outperforms some currently popular PSO algorithms.

160 citations


Journal ArticleDOI
TL;DR: New tensor methods for unconstrained convex optimization, which solve at each iteration an auxiliary problem of minimizing convex multivariate polynomial, and an efficient technique for solving the auxiliary problem, based on the recently developed relative smoothness condition are developed.
Abstract: In this paper we develop new tensor methods for unconstrained convex optimization, which solve at each iteration an auxiliary problem of minimizing convex multivariate polynomial. We analyze the simplest scheme, based on minimization of a regularized local model of the objective function, and its accelerated version obtained in the framework of estimating sequences. Their rates of convergence are compared with the worst-case lower complexity bounds for corresponding problem classes. Finally, for the third-order methods, we suggest an efficient technique for solving the auxiliary problem, which is based on the recently developed relative smoothness condition (Bauschke et al. in Math Oper Res 42:330–348, 2017; Lu et al. in SIOPT 28(1):333–354, 2018). With this elaboration, the third-order methods become implementable and very fast. The rate of convergence in terms of the function value for the accelerated third-order scheme reaches the level $$O\left( {1 \over k^4}\right) $$ , where k is the number of iterations. This is very close to the lower bound of the order $$O\left( {1 \over k^5}\right) $$ , which is also justified in this paper. At the same time, in many important cases the computational cost of one iteration of this method remains on the level typical for the second-order methods.

131 citations


Journal ArticleDOI
TL;DR: This work proposes an iterative optimization algorithm that is based on the projected gradient method (PGM) and derives the step size that guarantees the convergence of the proposed algorithm and defines a backtracking line search to improve its convergence rate.
Abstract: Reconfigurable intelligent surfaces (RISs) represent a new technology that can shape the radio wave propagation in wireless networks and offers a great variety of possible performance and implementation gains Motivated by this, we study the achievable rate optimization for multi-stream multiple-input multiple-output (MIMO) systems equipped with an RIS, and formulate a joint optimization problem of the covariance matrix of the transmitted signal and the RIS elements To solve this problem, we propose an iterative optimization algorithm that is based on the projected gradient method (PGM) We derive the step size that guarantees the convergence of the proposed algorithm and we define a backtracking line search to improve its convergence rate Furthermore, we introduce the total free space path loss (FSPL) ratio of the indirect and direct links as a first-order measure of the applicability of RISs in the considered communication system Simulation results show that the proposed PGM achieves the same achievable rate as a state-of-the-art benchmark scheme, but with a significantly lower computational complexity In addition, we demonstrate that the RIS application is particularly suitable to increase the achievable rate in indoor environments, as even a small number of RIS elements can provide a substantial achievable rate gain

127 citations


Journal ArticleDOI
TL;DR: In this article, lower complexity bounds of first-order methods on large-scale saddle-point problems were derived for affinely constrained smooth convex optimization problems, where the iterates are in the linear span of past first order information.
Abstract: On solving a convex-concave bilinear saddle-point problem (SPP), there have been many works studying the complexity results of first-order methods. These results are all about upper complexity bounds, which can determine at most how many iterations would guarantee a solution of desired accuracy. In this paper, we pursue the opposite direction by deriving lower complexity bounds of first-order methods on large-scale SPPs. Our results apply to the methods whose iterates are in the linear span of past first-order information, as well as more general methods that produce their iterates in an arbitrary manner based on first-order information. We first work on the affinely constrained smooth convex optimization that is a special case of SPP. Different from gradient method on unconstrained problems, we show that first-order methods on affinely constrained problems generally cannot be accelerated from the known convergence rate O(1 / t) to $$O(1/t^2)$$ , and in addition, O(1 / t) is optimal for convex problems. Moreover, we prove that for strongly convex problems, $$O(1/t^2)$$ is the best possible convergence rate, while it is known that gradient methods can have linear convergence on unconstrained problems. Then we extend these results to general SPPs. It turns out that our lower complexity bounds match with several established upper complexity bounds in the literature, and thus they are tight and indicate the optimality of several existing first-order methods.

125 citations


Journal ArticleDOI
TL;DR: In this paper, a distributed cooperative compound tracking issue of the vehicular platoon is studied, where the convergence time of the proposed algorithm does not depend on the initial values and design parameters, and simulation experiments are given to further verify the effectiveness of the presented theoretical findings.
Abstract: This article focuses on the distributed cooperative compound tracking issue of the vehicular platoon. First, a definition, called compound tracking control, is proposed, which means that the practical finite-time stability and asymptotical convergence can be simultaneously satisfied. Then, a modified performance function, named finite-time performance function, is designed, which possesses the faster convergence rate compared to the existing ones. Moreover, the adaptive neural network (NN), prescribed performance technique, and backstepping method are utilized to design a distributed cooperative regulation protocol. It is worth noting that the convergence time of the proposed algorithm does not depend on the initial values and design parameters. Finally, simulation experiments are given to further verify the effectiveness of the presented theoretical findings.

87 citations


Journal ArticleDOI
TL;DR: This article investigates the analog gradient aggregation (AGA) solution to overcome the communication bottleneck for wireless federated learning applications by exploiting the idea of analog over-the-air transmission by proposing a novel design of both the transceiver and learning algorithm.
Abstract: This article investigates the analog gradient aggregation (AGA) solution to overcome the communication bottleneck for wireless federated learning applications by exploiting the idea of analog over-the-air transmission Despite the various advantages, this special transmission solution also brings new challenges to both transceiver design and learning algorithm design due to the nonstationary local gradients and the time-varying wireless channels in different communication rounds To address these issues, we propose a novel design of both the transceiver and learning algorithm for the AGA solution In particular, the parameters in the transceiver are optimized with the consideration of the nonstationarity in the local gradients based on a simple feedback variable Moreover, a novel learning rate design is proposed for the stochastic gradient descent algorithm, which is adaptive to the quality of the gradient estimation Theoretical analyses are provided on the convergence rate of the proposed AGA solution Finally, the effectiveness of the proposed solution is confirmed by two separate experiments based on linear regression and the shallow neural network The simulation results verify that the proposed solution outperforms various state-of-the-art baseline schemes with a much faster convergence speed

86 citations


Journal ArticleDOI
TL;DR: In this article, a chaotic cloud quantum bat algorithm (CCQBA) is proposed to improve the performance of BA by using a 3D cat mapping chaotic disturbance mechanism to increase population diversity.
Abstract: The bat algorithm (BA) has fast convergence, a simple structure, and strong search ability. However, the standard BA has poor local search ability in the late evolution stage because it references the historical speed; its population diversity also declines rapidly. Moreover, since it lacks a mutation mechanism, it easily falls into local optima. To improve its performance, this paper develops a hybrid approach to improving its evolution mechanism, local search mechanism, mutation mechanism, and other mechanisms. First, the quantum computing mechanism (QCM) is used to update the searching position in the BA to improve its global convergence. Secondly, the X-condition cloud generator is used to help individuals with better fitness values to increase the rate of convergence, with the sorting of individuals after a particular number of iterations; the individuals with poor fitness values are used to implement a 3D cat mapping chaotic disturbance mechanism to increase population diversity and thereby enable the BA to jump out of a local optimum. Thus, a hybrid optimization algorithm—the chaotic cloud quantum bats algorithm (CCQBA)—is proposed. To test the performance of the proposed CCQBA, it is compared with alternative algorithms. The evaluation functions are nine classical comparative functions. The results of the comparison demonstrate that the convergent accuracy and convergent speed of the proposed CCQBA are significantly better than those of the other algorithms. Thus, the proposed CCQBA represents a better method than others for solving complex problems.

85 citations


Journal ArticleDOI
TL;DR: In this article, the authors analyzed several methods for approximating gradients of noisy functions using only function values, including finite differences, linear interpolation, Gaussian smoothing, and smoothing on a sphere.
Abstract: In this paper, we analyze several methods for approximating gradients of noisy functions using only function values. These methods include finite differences, linear interpolation, Gaussian smoothing, and smoothing on a sphere. The methods differ in the number of functions sampled, the choice of the sample points, and the way in which the gradient approximations are derived. For each method, we derive bounds on the number of samples and the sampling radius which guarantee favorable convergence properties for a line search or fixed step size descent method. To this end, we use the results in Berahas et al. (Global convergence rate analysis of a generic line search algorithm with noise, arXiv:1910.04055 , 2019) and show how each method can satisfy the sufficient conditions, possibly only with some sufficiently large probability at each iteration, as happens to be the case with Gaussian smoothing and smoothing on a sphere. Finally, we present numerical results evaluating the quality of the gradient approximations as well as their performance in conjunction with a line search derivative-free optimization algorithm.

80 citations


Journal ArticleDOI
TL;DR: The statistical results show the potential performance of NMS-CS in a widespread class of optimization problems and its excellent application for optimization problems having many constraints.
Abstract: In this paper, a Cuckoo search algorithm, namely the New Movement Strategy of Cuckoo Search (NMS-CS), is proposed. The novelty is in a random walk with step lengths calculated by Levy distribution. The step lengths in the original Cuckoo search (CS) are significant terms in simulating the Cuckoo bird's movement and are registered as a scalar vector. In NMS-CS, step lengths are modified from the scalar vector to the scalar number called orientation parameter. This parameter is controlled by using a function established from the random selection of one of three proposed novel functions. These functions have diverse characteristics such as; convex, concave, and linear, to establish a new strategy movement of Cuckoo birds in NMS-CS. As a result, the movement of NMS-CS is more flexible than a random walk in the original CS. By using the proposed functions, NMS-CS achieves the distance of movement long enough at the first iterations and short enough at the last iterations. It leads to the proposed algorithm achieving a better convergence rate and accuracy level in comparison with CS. The first 23 classical benchmark functions are selected to illustrate the convergence rate and level of accuracy of NMS-CS in detail compared with the original CS. Then, the other Algorithms such as Particle Swarm Optimization (PSO), Gravitational Search Algorithm (GSA), and Grey Wolf Optimizer (GWO) are employed to compare with NMS-CS in a ranking of the best accuracy. In the end, three engineering design problems (tension/compression spring design, pressure vessel design and welded beam design) are employed to demonstrate the effect of NMS-CS for solving various real-world problems. The statistical results show the potential performance of NMS-CS in a widespread class of optimization problems and its excellent application for optimization problems having many constraints. Source codes of NMS-CS is publicly available at http://goldensolutionrs.com/codes.html .

Journal ArticleDOI
TL;DR: The resulting algorithm, GT-DSGD, enjoys certain desirable characteristics towards minimizing a sum of smooth non-convex functions and improves the currently known best convergence rates and their dependence on problem parameters.
Abstract: In this paper, we study decentralized online stochastic non-convex optimization over a network of nodes. Integrating a technique called gradient tracking in decentralized stochastic gradient descent, we show that the resulting algorithm, GT-DSGD , enjoys certain desirable characteristics towards minimizing a sum of smooth non-convex functions. In particular, for general smooth non-convex functions, we establish non-asymptotic characterizations of GT-DSGD and derive the conditions under which it achieves network-independent performances that match the centralized minibatch SGD . In contrast, the existing results suggest that GT-DSGD is always network-dependent and is therefore strictly worse than the centralized minibatch SGD . When the global non-convex function additionally satisfies the Polyak-Łojasiewics (PL) condition, we establish the linear convergence of GT-DSGD up to a steady-state error with appropriate constant step-sizes. Moreover, under stochastic approximation step-sizes, we establish, for the first time, the optimal global sublinear convergence rate on almost every sample path, in addition to the asymptotically optimal sublinear rate in expectation. Since strongly convex functions are a special case of the functions satisfying the PL condition, our results are not only immediately applicable but also improve the currently known best convergence rates and their dependence on problem parameters.

Journal ArticleDOI
TL;DR: An improved PSO with BSA called PSOBSA is proposed to resolve the original PSO algorithm's problems that BSA’s mutation and crossover operators were modified through the neighborhood to increase the convergence rate.
Abstract: The particle swarm optimization (PSO) is a population-based stochastic optimization technique by the social behavior of bird flocking and fish schooling. The PSO has a high convergence rate. It is prone to losing diversity along the iterative optimization process and may get trapped into a poor local optimum. Overcoming these defects is still a significant problem in PSO applications. In contrast, the backtracking search optimization algorithm (BSA) has a robust global exploration ability, whereas, it has a low local exploitation ability and converges slowly. This paper proposed an improved PSO with BSA called PSOBSA to resolve the original PSO algorithm’s problems that BSA’s mutation and crossover operators were modified through the neighborhood to increase the convergence rate. In addition to that, a new mutation operator was introduced to improve the convergence accuracy and evade the local optimum. Several benchmark problems are used to test the performance and efficiency of the proposed PSOBSA. The experimental results show that PSOBSA outperforms other well-known metaheuristic algorithms and several state-of-the-art PSO variants in terms of global exploration ability and accuracy, and rate of convergence on almost all of the benchmark problems.

Journal ArticleDOI
TL;DR: This paper proposes and analyzes zeroth-order stochastic approximation algorithms for nonconvex and convex optimization, with a focus on addressing constrained optimization, high-dimensional setting, and saddle point avoiding.
Abstract: In this paper, we propose and analyze zeroth-order stochastic approximation algorithms for nonconvex and convex optimization, with a focus on addressing constrained optimization, high-dimensional setting, and saddle point avoiding. To handle constrained optimization, we first propose generalizations of the conditional gradient algorithm achieving rates similar to the standard stochastic gradient algorithm using only zeroth-order information. To facilitate zeroth-order optimization in high dimensions, we explore the advantages of structural sparsity assumptions. Specifically, (i) we highlight an implicit regularization phenomenon where the standard stochastic gradient algorithm with zeroth-order information adapts to the sparsity of the problem at hand by just varying the step size and (ii) propose a truncated stochastic gradient algorithm with zeroth-order information, whose rate of convergence depends only poly-logarithmically on the dimensionality. We next focus on avoiding saddle points in nonconvex setting. Toward that, we interpret the Gaussian smoothing technique for estimating gradient based on zeroth-order information as an instantiation of first-order Stein’s identity. Based on this, we provide a novel linear-(in dimension) time estimator of the Hessian matrix of a function using only zeroth-order information, which is based on second-order Stein’s identity. We then provide a zeroth-order variant of cubic regularized Newton method for avoiding saddle points and discuss its rate of convergence to local minima.

Journal ArticleDOI
TL;DR: A general primal-dual algorithmic framework that unifies many existing state-of-the-art algorithms is proposed that establishes linear convergence of the proposed method to the exact minimizer in the presence of the nonsmooth term.
Abstract: This article studies a class of nonsmooth decentralized multiagent optimization problems where the agents aim at minimizing a sum of local strongly-convex smooth components plus a common nonsmooth term. We propose a general primal-dual algorithmic framework that unifies many existing state-of-the-art algorithms. We establish linear convergence of the proposed method to the exact minimizer in the presence of the nonsmooth term. Moreover, for the more general class of problems with agent specific nonsmooth terms, we show that linear convergence cannot be achieved (in the worst case) for the class of algorithms that uses the gradients and the proximal mappings of the smooth and nonsmooth parts, respectively. We further provide a numerical counterexample that shows how some state-of-the-art algorithms fail to converge linearly for strongly convex objectives and different local non smooth terms.

Journal ArticleDOI
TL;DR: In this article, the authors investigated analytical and numerical solutions of fractional fuzzy hybrid systems in Hilbert space, which are devoted to model control systems that are capable of controlling complex systems with continuous time dynamics.
Abstract: The pivotal aim of this paper is to investigate analytical and numerical solutions of fractional fuzzy hybrid system in Hilbert space. Such fuzzy systems are devoted to model control systems that are capable of controlling complex systems that have discrete events with continuous time dynamics. The fractional derivative is described in Atangana-Baleanu Caputo (ABC) sense, which is distinguished by its non-local and non-singular kernel. In this orientation, the main contribution of the current numerical investigation is to generalize the characterization theory of integer fuzzy IVP to the ABC-fractional derivative under a strongly generalized differentiability, and then apply the proposed method to deal with the fuzzy hybrid system numerically. This method optimized the approximate solutions based on orthogonalization Schmidt process on Sobolev spaces, which can be straightway employed in generating Fourier expansion within a sensible convergence rate. The reproducing kernel theory is employed to construct a series solution with parametric form for the considered model in the space of direct sum W 2 2 [ a , b ] ⊕ W 2 2 [ a , b ] . Some theorems related to convergence analysis and approximation error are also proved. Moreover, we obtain the exact solution for the fuzzy model by applying Laplace transform method. So, the results obtained using the proposed method are compared with those of exact solution. To show the effect of Atangana-Baleanu fractional operator, we compare the numerical solution of fractional fuzzy hybrid system with those of integer order. Two numerical examples are carried out to illustrate that such dynamical processes noticeably depend on time instant and time history, which can be efficiently modeled by employing the fractional calculus theory. Finally, the accuracy, efficiency, and simplicity of the proposed method are evident in both classical and fractional cases.

Journal ArticleDOI
TL;DR: Using semidefinite programming and duality it is proved that the norm of the residuals is upper bounded by the distance of the initial iterate to the closest fixed point divided by the number of iterations plus one.
Abstract: In this work, we give a tight estimate of the rate of convergence for the Halpern-iteration for approximating a fixed point of a nonexpansive mapping in a Hilbert space. Specifically, using semidefinite programming and duality we prove that the norm of the residuals is upper bounded by the distance of the initial iterate to the closest fixed point divided by the number of iterations plus one.

Journal ArticleDOI
TL;DR: A by-product of this analysis is a tuning recommendation for several existing (non-accelerated) distributed algorithms yielding provably faster (worst-case) convergence rate for the class of problems under consideration.
Abstract: We study distributed composite optimization over networks: agents minimize a sum of smooth (strongly) convex functions–the agents’ sum-utility–plus a nonsmooth (extended-valued) convex one. We propose a general unified algorithmic framework for such a class of problems and provide a convergence analysis leveraging the theory of operator splitting. Distinguishing features of our scheme are: (i) When each of the agent’s functions is strongly convex, the algorithm converges at a linear rate, whose dependence on the agents’ functions and network topology is decoupled ; (ii) When the objective function is convex (but not strongly convex), similar decoupling as in (i) is established for the coefficient of the proved sublinear rate. This also reveals the role of function heterogeneity on the convergence rate. (iii) The algorithm can adjust the ratio between the number of communications and computations to achieve a rate (in terms of computations) independent on the network connectivity; and (iv) A by-product of our analysis is a tuning recommendation for several existing (non-accelerated) distributed algorithms yielding provably faster (worst-case) convergence rate for the class of problems under consideration.

Journal ArticleDOI
TL;DR: This work numerically considers a thermoelastic problem where the thermal law is modeled using the so-called Moore–Gibson–Thompson equation, and a fully discrete algorithm is introduced and a discrete stability property is proved.

Journal ArticleDOI
TL;DR: A data-driven adaptive classifier is proposed and is shown to simultaneously attain within a logarithmic factor of the optimal rate over a large collection of parameter spaces and characterize precisely the contribution of the observations from the source distribution to the classification task under the target distribution.
Abstract: Human learners have the natural ability to use knowledge gained in one setting for learning in a different but related setting. This ability to transfer knowledge from one task to another is essential for effective learning. In this paper, we study transfer learning in the context of nonparametric classification based on observations from different distributions under the posterior drift model, which is a general framework and arises in many practical problems. We first establish the minimax rate of convergence and construct a rate-optimal two-sample weighted $K$-NN classifier. The results characterize precisely the contribution of the observations from the source distribution to the classification task under the target distribution. A data-driven adaptive classifier is then proposed and is shown to simultaneously attain within a logarithmic factor of the optimal rate over a large collection of parameter spaces. Simulation studies and real data applications are carried out where the numerical results further illustrate the theoretical analysis. Extensions to the case of multiple source distributions are also considered.

Journal ArticleDOI
TL;DR: Several new adaptive laws driven by the derived information of parameter estimation errors are proposed, which achieve faster convergence rate than conventional gradient descent algorithms and can be rigorously proved under the well-recognized persistent excitation condition.
Abstract: This paper presents an alternative adaptive parameter estimation framework for nonlinear systems with time-varying parameters. Unlike existing techniques that rely on the polynomial approximation of time-varying parameters, the proposed method can directly estimate the unknown time-varying parameters. Moreover, this paper proposes several new adaptive laws driven by the derived information of parameter estimation errors, which achieve faster convergence rate than conventional gradient descent algorithms. In particular, the exponential error convergence can be rigorously proved under the well-recognized persistent excitation condition. The robustness of the developed adaptive estimation schemes against bounded disturbances is also studied. Comparative simulation results reveal that the proposed approaches can achieve better estimation performance than several other estimation algorithms. Finally, the proposed parameter estimation methods are verified by conducting experiments based on a roto-magnet plant.

Journal ArticleDOI
TL;DR: An adaptive fuzzy-type zeroing neural network (AFT-ZNN) model is proposed to settle time-variant quadratic programming problem via integrating an adaptive fuzzy control strategy to adaptively adjust its convergence rate according to the value of the computational error.
Abstract: Zeroing neural network (ZNN), as an important class of recurrent neural network, has wide applications in various computation and optimization fields. In this article, based on the traditional-type zeroing neural network (TT-ZNN) model, an adaptive fuzzy-type zeroing neural network (AFT-ZNN) model is proposed to settle time-variant quadratic programming problem via integrating an adaptive fuzzy control strategy. The most prominent feature of the AFT-ZNN model is to use an adaptive fuzzy control value to adaptively adjust its convergence rate according to the value of the computational error. Four different activation functions are injected to analyze the convergence rate of the AFT-ZNN model. In addition, different membership functions and different ranges of the fuzzy control value are discussed to study the character of the AFT-ZNN model. Theoretical analysis and numerical comparison results further show that the AFT-ZNN model has better performance than the TT-ZNN model.

Journal ArticleDOI
01 Jul 2021
TL;DR: It is established that the random search method with two-point gradient estimates and a fixed number of roll-outs achieves accuracy in the LaTeX notation, significantly improves existing results on the model-free LQR problem.
Abstract: Model-free reinforcement learning techniques directly search over the parameter space of controllers. Although this often amounts to solving a nonconvex optimization problem, for benchmark control problems simple local search methods exhibit competitive performance. To understand this phenomenon, we study the discrete-time Linear Quadratic Regulator (LQR) problem with unknown state-space parameters. In spite of the lack of convexity, we establish that the random search method with two-point gradient estimates and a fixed number of roll-outs achieves $\epsilon $ -accuracy in $O$ ( $\log $ (1/ $\epsilon $ )) iterations. This significantly improves existing results on the model-free LQR problem which require $O$ (1/ $\epsilon $ ) total roll-outs.

Journal ArticleDOI
TL;DR: This article establishes exponential stability for the ordinary differential equation (ODE) that governs the gradient-flow dynamics over the set of stabilizing feedback gains and shows that a similar result holds for the gradient descent method that arises from the forward Euler discretization of the corresponding ODE.
Abstract: Model-free reinforcement learning attempts to find an optimal control action for an unknown dynamical system by directly searching over the parameter space of controllers. The convergence behavior and statistical properties of these approaches are often poorly understood because of the nonconvex nature of the underlying optimization problems and the lack of exact gradient computation. In this paper, we take a step towards demystifying the performance and efficiency of such methods by focusing on the standard infinite-horizon linear quadratic regulator problem for continuous-time systems with unknown state-space parameters. We establish exponential stability for the ordinary differential equation (ODE) that governs the gradient-flow dynamics over the set of stabilizing feedback gains and show that a similar result holds for the gradient descent method that arises from the forward Euler discretization of the corresponding ODE. We also provide theoretical bounds on the convergence rate and sample complexity of the random search method with two-point gradient estimates. We prove that the required simulation time for achieving $\epsilon$ -accuracy in the model-free setup and the total n

Journal ArticleDOI
TL;DR: It is shown that, for $L_2$-approximation of functions from a separable Hilbert space in the worst-case setting, linear algorithms based on function values are almost as powerful as arbitrary linear algorithms if the approximation numbers are square-summable.

Journal ArticleDOI
TL;DR: This article introduces a novel quantization method, which it is shown that if the objective functions are convex or strongly convex, then using adaptive quantization does not affect the rate of convergence of the distributed subgradient methods when the communications are quantized, except for a constant that depends on the resolution of the quantizer.
Abstract: We study distributed optimization problems over a network when the communication between the nodes is constrained, and therefore, information that is exchanged between the nodes must be quantized. Recent advances using the distributed gradient algorithm with a quantization scheme at a fixed resolution have established convergence, but at rates significantly slower than when the communications are unquantized. In this article, we introduce a novel quantization method, which we refer to as adaptive quantization, that allows us to match the convergence rates under perfect communications. Our approach adjusts the quantization scheme used by each node as the algorithm progresses: as we approach the solution, we become more certain about where the state variables are localized and adapt the quantizer codebook accordingly. We bound the convergence rates of the proposed method as a function of the communication bandwidth, the underlying network topology, and structural properties of the constituent objective functions. In particular, we show that if the objective functions are convex or strongly convex, then using adaptive quantization does not affect the rate of convergence of the distributed subgradient methods when the communications are quantized, except for a constant that depends on the resolution of the quantizer. To the best of our knowledge, the rates achieved in this article are better than any existing work in the literature for distributed gradient methods under finite communication bandwidths. We also provide numerical simulations that compare convergence properties of the distributed gradient methods with and without quantization for solving distributed regression problems for both quadratic and absolute loss functions.

Journal ArticleDOI
TL;DR: In this paper, the convergence rate of the inexact version of the augmented Lagrangian method was studied for general convex programs with both equality and inequality constraints, and the global convergence rate was established in terms of the number of gradient evaluations to obtain a primal and/or primal-dual solution with a specified accuracy.
Abstract: Augmented Lagrangian method (ALM) has been popularly used for solving constrained optimization problems. Practically, subproblems for updating primal variables in the framework of ALM usually can only be solved inexactly. The convergence and local convergence speed of ALM have been extensively studied. However, the global convergence rate of the inexact ALM is still open for problems with nonlinear inequality constraints. In this paper, we work on general convex programs with both equality and inequality constraints. For these problems, we establish the global convergence rate of the inexact ALM and estimate its iteration complexity in terms of the number of gradient evaluations to produce a primal and/or primal-dual solution with a specified accuracy. We first establish an ergodic convergence rate result of the inexact ALM that uses constant penalty parameters or geometrically increasing penalty parameters. Based on the convergence rate result, we then apply Nesterov’s optimal first-order method on each primal subproblem and estimate the iteration complexity of the inexact ALM. We show that if the objective is convex, then $$O(\varepsilon ^{-1})$$ gradient evaluations are sufficient to guarantee a primal $$\varepsilon $$ -solution in terms of both primal objective and feasibility violation. If the objective is strongly convex, the result can be improved to $$O(\varepsilon ^{-\frac{1}{2}}|\log \varepsilon |)$$ . To produce a primal-dual $$\varepsilon $$ -solution, more gradient evaluations are needed for convex case, and the number is $$O(\varepsilon ^{-\frac{4}{3}})$$ , while for strongly convex case, the number is still $$O(\varepsilon ^{-\frac{1}{2}}|\log \varepsilon |)$$ . Finally, we establish a nonergodic convergence rate result of the inexact ALM that uses geometrically increasing penalty parameters. This result is established only for the primal problem. We show that the nonergodic iteration complexity result is in the same order as that for the ergodic result. Numerical experiments on quadratically constrained quadratic programming are conducted to compare the performance of the inexact ALM with different settings.

Journal ArticleDOI
TL;DR: In this article, a local hybrid kernel meshless approach is proposed to solve the modified time-fractional diffusion problem in the Riemann-Liouville sense, where the distribution of data points over the local support domain where the number of points is almost constant is considered.

Journal ArticleDOI
TL;DR: In this paper, an iterative scheme which combines the inertial subgradient extragradient method with viscosity technique and with self-adaptive stepsize was proposed.
Abstract: In this paper, we study a classical monotone and Lipschitz continuous variational inequality and fixed point problems defined on a level set of a convex function in the framework of Hilbert spaces. First, we introduce a new iterative scheme which combines the inertial subgradient extragradient method with viscosity technique and with self-adaptive stepsize. Unlike in many existing subgradient extragradient techniques in literature, the two projections of our proposed algorithm are made onto some half-spaces. Furthermore, we prove a strong convergence theorem for approximating a common solution of the variational inequality and fixed point of an infinite family of nonexpansive mappings under some mild conditions. The main advantages of our method are: the self-adaptive stepsize which avoids the need to know a priori the Lipschitz constant of the associated monotone operator, the two projections made onto some half-spaces, the strong convergence and the inertial technique employed which accelerates convergence rate of the algorithm. Second, we apply our theorem to solve generalised mixed equilibrium problem, zero point problems and convex minimization problem. Finally, we present some numerical examples to demonstrate the efficiency of our algorithm in comparison with other existing methods in literature. Our results improve and extend several existing works in the current literature in this direction.

Journal ArticleDOI
TL;DR: In this paper, a meshless numerical approach is proposed to detect the differential and boundary conditions of equilibrium along with the consistent form of the constitutive laws in the unified gradient elasticity theory for nano-mechanics of torsion.
Abstract: The unified gradient elasticity theory with applications to nano-mechanics of torsion is examined. The Reissner stationary variational principle is invoked to detect the differential and boundary conditions of equilibrium along with the consistent form of the constitutive laws. An efficient meshless numerical approach is established by making recourse to the Reissner variational functional wherein independent series solution of the kinematic and kinetic field variables are proposed. Suitable forms of the coordinate functions, in terms of the Chebyshev polynomials, are introduced to fulfill a set of kinematic and higher-order boundary conditions in the elastic torsion of nano-bars with practical kinematic constraints. Torsional behavior of the unified gradient elastic bar is studied for structural schemes of applicative interest. An excellent agreement between the torsional responses of the nano-bar detected based on the established meshless method and obtained exact analytical solution is realized. The proposed meshless numerical approach is confirmed to have a fast convergence rate and an admissible convergence region in determination of the torsional rotation field with high accuracy. The introduced meshless method is demonstrated to be highly efficacious in characterizing both the softening and stiffening structural behaviors at nano-scale. The presented numerical approach therefore paves the way ahead in mechanics of nano-structures.