scispace - formally typeset
Search or ask a question

Showing papers on "Rate of convergence published in 2020"


Journal ArticleDOI
TL;DR: The processes of nuptial dance and random flight enhance the balance between algorithm’s exploration and exploitation properties and assist its escape from local optima.

356 citations


Posted Content
TL;DR: A novel gradient descent algorithm is proposed that utilizes the eigenvalues of the NTK to adaptively calibrate the convergence rate of the total training error and a series of numerical experiments are performed to verify the correctness of the theory and the practical effectiveness of the proposed algorithms.
Abstract: Physics-informed neural networks (PINNs) have lately received great attention thanks to their flexibility in tackling a wide range of forward and inverse problems involving partial differential equations. However, despite their noticeable empirical success, little is known about how such constrained neural networks behave during their training via gradient descent. More importantly, even less is known about why such models sometimes fail to train at all. In this work, we aim to investigate these questions through the lens of the Neural Tangent Kernel (NTK); a kernel that captures the behavior of fully-connected neural networks in the infinite width limit during training via gradient descent. Specifically, we derive the NTK of PINNs and prove that, under appropriate conditions, it converges to a deterministic kernel that stays constant during training in the infinite-width limit. This allows us to analyze the training dynamics of PINNs through the lens of their limiting NTK and find a remarkable discrepancy in the convergence rate of the different loss components contributing to the total training error. To address this fundamental pathology, we propose a novel gradient descent algorithm that utilizes the eigenvalues of the NTK to adaptively calibrate the convergence rate of the total training error. Finally, we perform a series of numerical experiments to verify the correctness of our theory and the practical effectiveness of the proposed algorithms. The data and code accompanying this manuscript are publicly available at \url{this https URL}.

308 citations


Proceedings Article
30 Apr 2020
TL;DR: In this paper, the authors analyzed the convergence of Federated Averaging on non-iid data and established a convergence rate of O(mathcal{O}(\frac{1}{T}) for strongly convex and smooth problems, where T is the number of SGDs.
Abstract: Federated learning enables a large amount of edge computing devices to jointly learn a model without data sharing. As a leading algorithm in this setting, Federated Averaging (\texttt{FedAvg}) runs Stochastic Gradient Descent (SGD) in parallel on a small subset of the total devices and averages the sequences only once in a while. Despite its simplicity, it lacks theoretical guarantees under realistic settings. In this paper, we analyze the convergence of \texttt{FedAvg} on non-iid data and establish a convergence rate of $\mathcal{O}(\frac{1}{T})$ for strongly convex and smooth problems, where $T$ is the number of SGDs. Importantly, our bound demonstrates a trade-off between communication-efficiency and convergence rate. As user devices may be disconnected from the server, we relax the assumption of full device participation to partial device participation and study different averaging schemes; low device participation rate can be achieved without severely slowing down the learning. Our results indicate that heterogeneity of data slows down the convergence, which matches empirical observations. Furthermore, we provide a necessary condition for \texttt{FedAvg} on non-iid data: the learning rate $\eta$ must decay, even if full-gradient is used; otherwise, the solution will be $\Omega (\eta)$ away from the optimal.

307 citations


Journal ArticleDOI
TL;DR: This article considers momentum term which relates to the last iteration of FL, which establishes global convergence properties of MFL and derive an upper bound on MFL convergence rate, and provides conditions in which MFL accelerates the convergence.
Abstract: Federated learning (FL) provides a communication-efficient approach to solve machine learning problems concerning distributed data, without sending raw data to a central server. However, existing works on FL only utilize first-order gradient descent (GD) and do not consider the preceding iterations to gradient update which can potentially accelerate convergence. In this article, we consider momentum term which relates to the last iteration. The proposed momentum federated learning (MFL) uses momentum gradient descent (MGD) in the local update step of FL system. We establish global convergence properties of MFL and derive an upper bound on MFL convergence rate. Comparing the upper bounds on MFL and FL convergence rates, we provide conditions in which MFL accelerates the convergence. For different machine learning models, the convergence performance of MFL is evaluated based on experiments with MNIST and CIFAR-10 datasets. Simulation results confirm that MFL is globally convergent and further reveal significant convergence improvement over FL.

220 citations


Journal ArticleDOI
TL;DR: An adaptive fast nonsingular integral terminal sliding mode control (AFNITSMC) method that provides AUV dynamics a faster convergence rate and its superiority over the existing ANITSMC method is proposed.
Abstract: This article aims to develop an effective control method that can improve the convergence rate over the existing adaptive nonsingular integral terminal sliding mode control (ANITSMC) method for the trajectory tracking control of autonomous underwater vehicles (AUVs). To achieve this goal, an adaptive fast nonsingular integral terminal sliding mode control (AFNITSMC) method is proposed. First, considering that the existing nonsingular integral terminal sliding mode (NITSM) has slow convergence rate in the region far from the equilibrium point, a fast NITSM (FNITSM) is proposed, which guarantees fast transient convergence both at a distance from and at a close range of the equilibrium point, and therefore increases the convergence rate over the existing NITSM. Then, using this FNITSM and adaptive technique, an AFNITSMC method is designed for AUVs. It yields local finite-time convergence of the velocity tracking errors to zero and then local exponential convergence of the position tracking errors to zero, without requiring any a priori knowledge of the upper bounds of the uncertainties and disturbances. Compared with the existing ANITSMC method, the salient feature of the proposed AFNITSMC method is that it provides AUV dynamics a faster convergence rate. Finally, simulation results demonstrate the efficiency of the proposed AFNITSMC method and its superiority over the existing ANITSMC method.

212 citations


Proceedings Article
15 Jul 2020
TL;DR: One insight of this work is in formalizing the importance how a favorable initial state distribution provides a means to circumvent worst-case exploration issues, analogous to the global convergence guarantees of iterative value function based algorithms.
Abstract: Policy gradient (PG) methods are among the most effective methods in challenging reinforcement learning problems with large state and/or action spaces. However, little is known about even their most basic theoretical convergence properties, including: if and how fast they converge to a globally optimal solution (say with a sufficiently rich policy class); how they cope with approximation error due to using a restricted class of parametric policies; or their finite sample behavior. Such characterizations are important not only to compare these methods to their approximate value function counterparts (where such issues are relatively well understood, at least in the worst case), but also to help with more principled approaches to algorithm design. This work provides provable characterizations of computational, approximation, and sample size issues with regards to policy gradient methods in the context of discounted Markov Decision Processes (MDPs). We focus on both: 1) ``tabular'' policy parameterizations, where the optimal policy is contained in the class and where we show global convergence to the optimal policy, and 2) restricted policy classes, which may not contain the optimal policy and where we provide agnostic learning results. In the emph{tabular setting}, our main results are: 1) convergence rate to global optimum for direct parameterization and projected gradient ascent 2) an asymptotic convergence to global optimum for softmax policy parameterization and PG; and a convergence rate with additional entropy regularization, and 3) dimension-free convergence to global optimum for softmax policy parameterization and Natural Policy Gradient (NPG) method with exact gradients. In emph{function approximation}, we further analyze NPG with exact as well as inexact gradients under certain smoothness assumptions on the policy parameterization and establish rates of convergence in terms of the quality of the initial state distribution. One insight of this work is in formalizing how a favorable initial state distribution provides a means to circumvent worst-case exploration issues. Overall, these results place PG methods under a solid theoretical footing, analogous to the global convergence guarantees of iterative value function based algorithms.

198 citations


Book
13 Jun 2020
TL;DR: In this article, the authors present a preliminary version of a book which presents the quantitative homogenization and large-scale regularity theory for elliptic equations in divergence-form.
Abstract: This is a preliminary version of a book which presents the quantitative homogenization and large-scale regularity theory for elliptic equations in divergence-form The self-contained presentation gives new and simplified proofs of the core results proved in the last several years, including the algebraic convergence rate for the variational subadditive quantities, the large-scale Lipschitz and higher regularity estimates and Liouville-type results, optimal quantitative estimates on the first-order correctors and their scaling limit to a Gaussian free field The last chapter contains new results on the homogenization of the Dirichlet problem, including optimal quantitative estimates of the homogenization error and the two-scale expansion

183 citations


Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors investigated the evolution process of a particle swarm optimization algorithm with care, and then proposed to incorporate more dynamic information into it for avoiding accuracy loss caused by premature convergence without extra computation burden.
Abstract: High-dimensional and sparse (HiDS) matrices are frequently found in various industrial applications. A latent factor analysis (LFA) model is commonly adopted to extract useful knowledge from an HiDS matrix, whose parameter training mostly relies on a stochastic gradient descent (SGD) algorithm. However, an SGD-based LFA model's learning rate is hard to tune in real applications, making it vital to implement its self-adaptation. To address this critical issue, this study firstly investigates the evolution process of a particle swarm optimization algorithm with care, and then proposes to incorporate more dynamic information into it for avoiding accuracy loss caused by premature convergence without extra computation burden, thereby innovatively achieving a novel position-transitional particle swarm optimization (P2SO) algorithm. It is subsequently adopted to implement a P2SO-based LFA (PLFA) model that builds a learning rate swarm applied to the same group of LFs. Thus, a PLFA model implements highly efficient learning rate adaptation as well as represents an HiDS matrix precisely. Experimental results on four HiDS matrices emerging from real applications demonstrate that compared with an SGD-based LFA model, a PLFA model no longer suffers from a tedious and expensive tuning process of its learning rate to achieve higher prediction accuracy for missing data.

169 citations


Proceedings Article
12 Jul 2020
TL;DR: It is shown that with the true gradient, policy gradient with a softmax parametrization converges at a $O(1/t)$ rate, with constants depending on the problem and initialization, which significantly expands the recent asymptotic convergence results.
Abstract: We make three contributions toward better understanding policy gradient methods in the tabular setting. First, we show that with the true gradient, policy gradient with a softmax parametrization converges at a $O(1/t)$ rate, with constants depending on the problem and initialization. This result significantly expands the recent asymptotic convergence results. The analysis relies on two findings: that the softmax policy gradient satisfies a Łojasiewicz inequality, and the minimum probability of an optimal action during optimization can be bounded in terms of its initial value. Second, we analyze entropy regularized policy gradient and show that it enjoys a significantly faster linear convergence rate $O(e^{-t})$ toward softmax optimal policy. This result resolves an open question in the recent literature. Finally, combining the above two results and additional new $\Omega(1/t)$ lower bound results, we explain how entropy regularization improves policy optimization, even with the true gradient, from the perspective of convergence rate. The separation of rates is further explained using the notion of non-uniform Łojasiewicz degree. These results provide a theoretical understanding of the impact of entropy and corroborate existing empirical studies.

160 citations


Journal ArticleDOI
TL;DR: A novel concept, which is called stochastic momentum, aimed at decreasing the cost of performing the momentum step is proposed, and it is proved that in some sparse data regimes and for sufficiently small momentum parameters, these methods enjoy better overall complexity than methods with deterministic momentum.
Abstract: In this paper we study several classes of stochastic optimization algorithms enriched with heavy ball momentum. Among the methods studied are: stochastic gradient descent, stochastic Newton, stochastic proximal point and stochastic dual subspace ascent. This is the first time momentum variants of several of these methods are studied. We choose to perform our analysis in a setting in which all of the above methods are equivalent: convex quadratic problems. We prove global non-asymptotic linear convergence rates for all methods and various measures of success, including primal function values, primal iterates, and dual function values. We also show that the primal iterates converge at an accelerated linear rate in a somewhat weaker sense. This is the first time a linear rate is shown for the stochastic heavy ball method (i.e., stochastic gradient descent method with momentum). Under somewhat weaker conditions, we establish a sublinear convergence rate for Cesaro averages of primal iterates. Moreover, we propose a novel concept, which we call stochastic momentum, aimed at decreasing the cost of performing the momentum step. We prove linear convergence of several stochastic methods with stochastic momentum, and show that in some sparse data regimes and for sufficiently small momentum parameters, these methods enjoy better overall complexity than methods with deterministic momentum. Finally, we perform extensive numerical testing on artificial and real datasets, including data coming from average consensus problems.

121 citations


Posted Content
TL;DR: Theoretical analysis of gradient TD (GTD) reinforcement learning methods implies that the GTD family of algorithms are comparable and may indeed be preferred over existing least squares TD methods for off-policy learning, due to their linear complexity.
Abstract: In this paper, we analyze the convergence rate of the gradient temporal difference learning (GTD) family of algorithms Previous analyses of this class of algorithms use ODE techniques to prove asymptotic convergence, and to the best of our knowledge, no finite-sample analysis has been done Moreover, there has been not much work on finite-sample analysis for convergent off-policy reinforcement learning algorithms In this paper, we formulate GTD methods as stochastic gradient algorithms wrt~a primal-dual saddle-point objective function, and then conduct a saddle-point error analysis to obtain finite-sample bounds on their performance Two revised algorithms are also proposed, namely projected GTD2 and GTD2-MP, which offer improved convergence guarantees and acceleration, respectively The results of our theoretical analysis show that the GTD family of algorithms are indeed comparable to the existing LSTD methods in off-policy learning scenarios

Journal ArticleDOI
TL;DR: The closed-loop attitude stabilization system is proved to be fixed-time stable with the convergence time independent of initial states and the attitude stabilization performance is robust to disturbance and uncertainties in inertia and actuators.
Abstract: A robust fixed-time control framework is presented to stabilize flexible spacecraft’s attitude system with external disturbance, uncertain parameters of inertia, and actuator uncertainty. As a stepping stone, a nonlinear system having faster fixed-time convergence property is preliminarily proposed by introducing a time-varying gain into the conventional fixed-time stability method. This gain improves the convergence rate. Then, a fixed-time observer is proposed to estimate the uncertain torque induced by disturbance, uncertain parameters of inertia, and actuator uncertainty. Fixed-time stability is ensured for the estimation error. Using this estimated knowledge and the full-states’ measurements, a nonsingular terminal sliding controller is finally synthesized. This is achieved via a nonsingular and faster terminal sliding surface with faster convergence rate. The closed-loop attitude stabilization system is proved to be fixed-time stable with the convergence time independent of initial states. The attitude stabilization performance is robust to disturbance and uncertainties in inertia and actuators. Simulation results are also shown to validate the attitude stabilization performance of this control approach.

Journal ArticleDOI
TL;DR: This article establishes a rigorous theoretical foundation for SGD in nonconvex learning by showing that this boundedness assumption can be removed without affecting convergence rates, and relaxing the standard smoothness assumption to Hölder continuity of gradients.
Abstract: Stochastic gradient descent (SGD) is a popular and efficient method with wide applications in training deep neural nets and other nonconvex models. While the behavior of SGD is well understood in the convex learning setting, the existing theoretical results for SGD applied to nonconvex objective functions are far from mature. For example, existing results require to impose a nontrivial assumption on the uniform boundedness of gradients for all iterates encountered in the learning process, which is hard to verify in practical implementations. In this article, we establish a rigorous theoretical foundation for SGD in nonconvex learning by showing that this boundedness assumption can be removed without affecting convergence rates, and relaxing the standard smoothness assumption to Holder continuity of gradients. In particular, we establish sufficient conditions for almost sure convergence as well as optimal convergence rates for SGD applied to both general nonconvex and gradient-dominated objective functions. A linear convergence is further derived in the case with zero variances.

Journal ArticleDOI
TL;DR: The GT-VR framework as discussed by the authors is a stochastic and decentralized framework to minimize a finite-sum of functions available over a network of nodes, which is particularly suitable for problems where large-scale, potentially private data, cannot be collected or processed at a centralized server.
Abstract: This paper describes a novel algorithmic framework to minimize a finite-sum of functions available over a network of nodes. The proposed framework, that we call GT-VR , is stochastic and decentralized, and thus is particularly suitable for problems where large-scale, potentially private data, cannot be collected or processed at a centralized server. The GT-VR framework leads to a family of algorithms with two key ingredients: (i) local variance reduction , that enables estimating the local batch gradients from arbitrarily drawn samples of local data; and, (ii) global gradient tracking , which fuses the gradient information across the nodes. Naturally, combining different variance reduction and gradient tracking techniques leads to different algorithms of interest with valuable practical tradeoffs and design considerations. Our focus in this paper is on two instantiations of the ${\bf \mathtt {GT-VR}}$ framework, namely GT-SAGA and GT-SVRG , that, similar to their centralized counterparts ( SAGA and SVRG ), exhibit a compromise between space and time. We show that both GT-SAGA and GT-SVRG achieve accelerated linear convergence for smooth and strongly convex problems and further describe the regimes in which they achieve non-asymptotic, network-independent linear convergence rates that are faster with respect to the existing decentralized first-order schemes. Moreover, we show that both algorithms achieve a linear speedup in such regimes compared to their centralized counterparts that process all data at a single node. Extensive simulations illustrate the convergence behavior of the corresponding algorithms.

Journal ArticleDOI
TL;DR: This paper associates to a pseudo-monotone variational inequality a forward-backward-forward dynamical system and carries out an asymptotic analysis for the generated trajectories and proves that linear convergence is guaranteed under strong pseudo- monotonicity.

Journal ArticleDOI
TL;DR: The governing partial differential equation generalizes the Hodgkin–Huxley, the Allen–Cahn and the Fisher–Kolmogorov–Petrovskii–Piscounov equations, and guarantees the unconditional stability.
Abstract: For the first time in literature, semi-implicit spectral approximations for nonlinear Caputo time- and Riesz space-fractional diffusion equations with both smooth and non-smooth solutions are proposed. More precisely, the governing partial differential equation generalizes the Hodgkin–Huxley, the Allen–Cahn and the Fisher–Kolmogorov–Petrovskii–Piscounov equations. The schemes employ a Legendre-based Galerkin spectral method for the Riesz space-fractional derivative, and L1-type approximations with both uniform and graded meshes for the Caputo time-fractional derivative. More importantly, by using fractional Gronwall inequalities and their associated discrete forms, sharp error estimates are proved which show an enhancement in the convergence rate compared with the standard L1 approximation on uniform meshes. This analysis encompasses both uniform meshes as well as meshes that are graded in time, and guarantees the unconditional stability. The numerical results that accompany our analysis confirm our theoretical error estimates, and give significant insights into the convergence behavior of our schemes for problems with smooth and non-smooth solutions.

Journal ArticleDOI
TL;DR: In this paper, the homotopy analysis transform method (HATM) is used to solve the time fractional order Korteweg-de Vries (KdV) and Kortheweg de Vries-Burger's (kdVB) equations.

Journal ArticleDOI
TL;DR: On the basis of a regularization technique using the Moreau envelope, a class of first-order algorithms involving inertial features involving both viscous and Hessian-driven dampings are extended to non-smooth convex functions with extended real values.
Abstract: In a Hilbert space setting, for convex optimization, we analyze the convergence rate of a class of first-order algorithms involving inertial features. They can be interpreted as discrete time versions of inertial dynamics involving both viscous and Hessian-driven dampings. The geometrical damping driven by the Hessian intervenes in the dynamics in the form $$ abla ^2 f (x(t)) \dot{x} (t)$$ . By treating this term as the time derivative of $$ abla f (x (t)) $$ , this gives, in discretized form, first-order algorithms in time and space. In addition to the convergence properties attached to Nesterov-type accelerated gradient methods, the algorithms thus obtained are new and show a rapid convergence towards zero of the gradients. On the basis of a regularization technique using the Moreau envelope, we extend these methods to non-smooth convex functions with extended real values. The introduction of time scale factors makes it possible to further accelerate these algorithms. We also report numerical results on structured problems to support our theoretical findings.

Journal ArticleDOI
TL;DR: This study improves the estimation accuracy of the missing outputs of the ARX models with missing outputs by introducing a modified Kalman filter, and the parameter estimation convergence rate by deriving a new multi-step-length formulation.

Journal ArticleDOI
TL;DR: In this paper, the numerical approximation of solutions to stochastic partial differential equations with additive spatial white noise on bounded domains in R-d is considered, where the differential operator is given by the fractional power L-beta, beta is an element of (0, 1) of an integer-order elliptic differential operator L and is therefore nonlocal.
Abstract: The numerical approximation of solutions to stochastic partial differential equations with additive spatial white noise on bounded domains in R-d is considered. The differential operator is given by the fractional power L-beta, beta is an element of (0, 1) of an integer-order elliptic differential operator L and is therefore nonlocal. Its inverse L-beta is represented by a Bochner integral from the Dunford-Taylor functional calculus. By applying a quadrature formula to this integral representation the inverse fractional-order operator L-beta is approximated by a weighted sum of nonfractional resolvents (I + exp(2yl)L)(-1) at certain quadrature nodes t(j) > 0. The resolvents are then discretized in space by a standard finite element method. This approach is combined with an approximation of the white noise, which is based only on the mass matrix of the finite element discretization. In this way an efficient numerical algorithm for computing samples of the approximate solution is obtained. For the resulting approximation the strong mean-square error is analyzed and an explicit rate of convergence is derived. Numerical experiments for L = kappa(2) - Delta, kappa > 0 with homogeneous Dirichlet boundary conditions on the unit cube (0, 1)(d) in d = 1, 2, 3 spatial dimensions for varying beta is an element of (0, 1) attest to the theoretical results.

Journal ArticleDOI
TL;DR: This paper studies a class of distributed convex optimization problems by a set of agents in which each agent only has access to its own local convex objective function and the estimate of each agent is restricted to both coupling linear constraint and individual box constraints.
Abstract: This paper studies a class of distributed convex optimization problems by a set of agents in which each agent only has access to its own local convex objective function and the estimate of each agent is restricted to both coupling linear constraint and individual box constraints. Our focus is to devise a distributed primal-dual gradient algorithm for working out the problem over a sequence of time-varying general directed graphs. The communications among agents are assumed to be uniformly strongly connected. A column-stochastic mixing matrix and a fixed step-size are applied in the algorithm which exactly steers all the agents to asymptotically converge to a global optimal solution. Based on the standard strong convexity and the smoothness assumptions of the objective functions, we show that the distributed algorithm is capable of driving the whole network to geometrically converge to an optimal solution of the convex optimization problem only if the step-size does not exceed some upper bound. We also give an explicit analysis for the convergence rate of the proposed optimization algorithm. Simulations on economic dispatch problems and demand response problems in power systems are performed to illustrate the effectiveness of the proposed optimization algorithm.

Journal ArticleDOI
TL;DR: This article proposes a general distributed asynchronous algorithmic framework whereby agents can update their local variables as well as communicate with their neighbors at any time, without any form of coordination, and proves that this is the first distributed algorithm with provable geometric convergence rate in such a general asynchronous setting.
Abstract: This article studies multiagent (convex and nonconvex ) optimization over static digraphs. We propose a general distributed asynchronous algorithmic framework whereby 1) agents can update their local variables as well as communicate with their neighbors at any time, without any form of coordination; and 2) they can perform their local computations using (possibly) delayed, out-of-sync information from the other agents. Delays need not be known to the agent or obey any specific profile, and can also be time-varying (but bounded). The algorithm builds on a tracking mechanism that is robust against asynchrony (in the above sense), whose goal is to estimate locally the average of agents’ gradients. When applied to strongly convex functions, we prove that it converges at an R-linear (geometric) rate as long as the step-size is sufficiently small. A sublinear convergence rate is proved, when nonconvex problems and/or diminishing, uncoordinated step-sizes are considered. To the best of our knowledge, this is the first distributed algorithm with provable geometric convergence rate in such a general asynchronous setting. Preliminary numerical results demonstrate the efficacy of the proposed algorithm and validate our theoretical findings.

Journal ArticleDOI
TL;DR: For a class of priors that admit the structure of a mixture of product measures, the authors proposed a prior mass condition, under which the variational approximation error of the mean-field class is dominated by convergence rate of the true posterior.
Abstract: We study convergence rates of variational posterior distributions for nonparametric and high-dimensional inference. We formulate general conditions on prior, likelihood and variational class that characterize the convergence rates. Under similar “prior mass and testing” conditions considered in the literature, the rate is found to be the sum of two terms. The first term stands for the convergence rate of the true posterior distribution, and the second term is contributed by the variational approximation error. For a class of priors that admit the structure of a mixture of product measures, we propose a novel prior mass condition, under which the variational approximation error of the mean-field class is dominated by convergence rate of the true posterior. We demonstrate the applicability of our general results for various models, prior distributions and variational classes by deriving convergence rates of the corresponding variational posteriors.

Journal ArticleDOI
TL;DR: This paper considers a class of representative problems and proposes a novel iterative algorithm for PinT computation that can solve the PDEs for all the discrete time points simultaneously via the diagonalization technique proposed recently.

Journal ArticleDOI
TL;DR: In this article, the Stancu variant of Bernstein-Kantorovich operators based on shape parameter $$\alpha $$ was constructed and the rate of convergence of these operators by means of suitable modulus of continuity to any continuous functions f(x) on $$x\in [0, 1]$$ and Voronovskaja-type approximation theorem was investigated.
Abstract: We construct the Stancu variant of Bernstein–Kantorovich operators based on shape parameter $$\alpha $$. We investigate the rate of convergence of these operators by means of suitable modulus of continuity to any continuous functions f(x) on $$x\in [0,1]$$ and Voronovskaja-type approximation theorem. Moreover, we study other approximation properties of our new operators such as weighted approximation as well as pointwise convergence. Finally, some illustrative graphics are provided here by our new Stancu-type Bernstein–Kantorovich operators in order to demonstrate the significance of our operators.

Journal ArticleDOI
TL;DR: A general approach to prove the concentration of variational approximations of fractional posteriors of matrix completion and Gaussian VB is proposed.
Abstract: While Bayesian methods are extremely popular in statistics and machine learning, their application to massive data sets is often challenging, when possible at all. The classical MCMC algorithms are prohibitively slow when both the model dimension and the sample size are large. Variational Bayesian methods aim at approximating the posterior by a distribution in a tractable family $\mathcal{F}$. Thus, MCMC are replaced by an optimization algorithm which is orders of magnitude faster. VB methods have been applied in such computationally demanding applications as collaborative filtering, image and video processing or NLP to name a few. However, despite nice results in practice, the theoretical properties of these approximations are not known. We propose a general oracle inequality that relates the quality of the VB approximation to the prior $\pi $ and to the structure of $\mathcal{F}$. We provide a simple condition that allows to derive rates of convergence from this oracle inequality. We apply our theory to various examples. First, we show that for parametric models with log-Lipschitz likelihood, Gaussian VB leads to efficient algorithms and consistent estimators. We then study a high-dimensional example: matrix completion, and a nonparametric example: density estimation.

Journal ArticleDOI
01 Nov 2020
TL;DR: Experimental results and comparison demonstrate that the proposed Modified Equilibrium Optimizer can be considered a better metaheuristic optimization approach than other compared algorithms.
Abstract: To alleviate the shortcomings of the standard Equilibrium Optimizer, a new improved algorithm called Modified Equilibrium Optimizer is proposed in this work. This algorithm utilizes the Gaussian mutation and an additional exploratory search mechanism based on the concept of population division and reconstruction. The population in each iteration of the proposed algorithm is constructed using these mechanisms and standard search procedure of the Equilibrium Optimizer. These strategies attempt to maintain the diversity of solutions during the search, so that the tendency of stagnation towards the sub-optimal solutions can be avoided and the convergence rate can be boosted to obtain more accurate optimal solutions. To validate and analyze the performance of the Modified Equilibrium Optimizer, a collection of 33 benchmark problems and four engineering design problems are adopted. Later, in the paper, the Modified Equilibrium Optimizer has been used to train multilayer perceptrons. The experimental results and comparison based on several metrics such as statistical analysis, scalability test, diversity analysis, performance index analysis and convergence analysis demonstrate that the proposed algorithm can be considered a better metaheuristic optimization approach than other compared algorithms.

Journal ArticleDOI
TL;DR: This paper considers the problem of data driven iterative learning control (DDILC) for a class of nonaffine nonlinear systems subject to data quantization and sensor saturation and proposes two novel quantized DDILC algorithms based on saturated and quantized information of system outputs.
Abstract: This paper considers the problem of data driven iterative learning control (DDILC) for a class of nonaffine nonlinear systems subject to data quantization and sensor saturation. Two novel quantized DDILC (QDDILC) algorithms are proposed based on saturated and quantized information of system outputs. The convergence of the proposed QDDILC algorithms is strictly proved and the effects of output saturation and data quantification are also analyzed. It is shown that sensor saturation does not change the convergence property, thus it causes the convergence rate to slow down. For the QDDILC algorithm, data quantization will cause the tracking error to converge to a bound depending on the quantization level. However, the modified QDDILC algorithm, which using the different quantization scheme from QDDILC algorithm, can ensure that the tracking error converges to zero. Illustrative simulations are exploited to verify the theoretical results.

Proceedings Article
10 Jun 2020
TL;DR: The theory for strongly-convex objectives tightly matches the known lower bounds for both RR and SO and substantiates the common practical heuristic of shuffling once or only a few times and proves fast convergence of the Shuffle-Once algorithm, which shuffles the data only once.
Abstract: Random Reshuffling (RR) is an algorithm for minimizing finite-sum functions that utilizes iterative gradient descent steps in conjunction with data reshuffling. Often contrasted with its sibling Stochastic Gradient Descent (SGD), RR is usually faster in practice and enjoys significant popularity in convex and non-convex optimization. The convergence rate of RR has attracted substantial attention recently and, for strongly convex and smooth functions, it was shown to converge faster than SGD if 1) the stepsize is small, 2) the gradients are bounded, and 3) the number of epochs is large. We remove these 3 assumptions, improve the dependence on the condition number from $\kappa^2$ to $\kappa$ (resp. from $\kappa$ to $\sqrt{\kappa}$) and, in addition, show that RR has a different type of variance. We argue through theory and experiments that the new variance type gives an additional justification of the superior performance of RR. To go beyond strong convexity, we present several results for non-strongly convex and non-convex objectives. We show that in all cases, our theory improves upon existing literature. Finally, we prove fast convergence of the Shuffle-Once (SO) algorithm, which shuffles the data only once, at the beginning of the optimization process. Our theory for strongly-convex objectives tightly matches the known lower bounds for both RR and SO and substantiates the common practical heuristic of shuffling once or only a few times. As a byproduct of our analysis, we also get new results for the Incremental Gradient algorithm (IG), which does not shuffle the data at all.

Journal ArticleDOI
TL;DR: This paper shows that for a sequence of over-relaxation parameters, that do not satisfy Nesterov’s rule, one can still expect some relatively fast convergence properties for the objective function.
Abstract: In this paper we study the convergence of an Inertial Forward-Backward algorithm, with a particular choice of an over-relaxation term. In particular we show that for a sequence of overrrelaxation parameters, that do not satisfy Nesterov’s rule one can still expect some relatively fast convergence properties for the objective function. In addition we complement this work by studying the convergence of the algorithm in the case where the proximal operator is inexactly computed with the presence of some errors and we give sufficient conditions over these errors in order to obtain some convergence properties for the objective function .