scispace - formally typeset
Search or ask a question
Topic

Rate of convergence

About: Rate of convergence is a research topic. Over the lifetime, 31257 publications have been published within this topic receiving 795334 citations. The topic is also known as: convergence rate.


Papers
More filters
Journal ArticleDOI
TL;DR: On decrit une methode qui inclut l'effet des changements de densite sur les potentiels and dimensionalise les changements d'energie cinetique et augmente le rapport de convergence par plus d'un ordre de grandeur pour les grands systemes.
Abstract: Iterative diagonalization of the Hamiltonian matrix is required to solve very large electronic-structure problems. Present algorithms are limited in their convergence rates at low wave numbers by stability problems associated with large changes in the Hartree potential, and at high wave numbers with large changes in the kinetic energy. A new method is described which includes the effect of density changes on the potentials and properly scales the changes in kinetic energy. The use of this method has increased the rate of convergence by over an order of magnitude for large problems.

744 citations

Posted Content
TL;DR: In this paper, the stochastic average gradient (SAG) method was proposed to optimize the sum of a finite number of smooth convex functions, which achieves a faster convergence rate than black-box SG methods.
Abstract: We propose the stochastic average gradient (SAG) method for optimizing the sum of a finite number of smooth convex functions. Like stochastic gradient (SG) methods, the SAG method's iteration cost is independent of the number of terms in the sum. However, by incorporating a memory of previous gradient values the SAG method achieves a faster convergence rate than black-box SG methods. The convergence rate is improved from O(1/k^{1/2}) to O(1/k) in general, and when the sum is strongly-convex the convergence rate is improved from the sub-linear O(1/k) to a linear convergence rate of the form O(p^k) for p \textless{} 1. Further, in many cases the convergence rate of the new method is also faster than black-box deterministic gradient methods, in terms of the number of gradient evaluations. Numerical experiments indicate that the new algorithm often dramatically outperforms existing SG and deterministic gradient methods, and that the performance may be further improved through the use of non-uniform sampling strategies.

744 citations

Journal ArticleDOI
TL;DR: Two new temporal diffence algorithms based on the theory of linear least-squares function approximation, LS TD and RLS TD, are introduced and prove probability-one convergence when it is used with a function approximator linear in the adjustable parameters.
Abstract: We introduce two new temporal diffence (TD) algorithms based on the theory of linear least-squares function approximation. We define an algorithm we call Least-Squares TD (LS TD) for which we prove probability-one convergence when it is used with a function approximator linear in the adjustable parameters. We then define a recursive version of this algorithm, Recursive Least-Squares TD (RLS TD). Although these new TD algorithms require more computation per time-step than do Sutton's TD(λ) algorithms, they are more efficient in a statistical sense because they extract more information from training experiences. We describe a simulation experiment showing the substantial improvement in learning rate achieved by RLS TD in an example Markov prediction problem. To quantify this improvement, we introduce theTD error variance of a Markov chain, ωTD, and experimentally conclude that the convergence rate of a TD algorithm depends linearly on ωTD. In addition to converging more rapidly, LS TD and RLS TD do not have control parameters, such as a learning rate parameter, thus eliminating the possibility of achieving poor performance by an unlucky choice of parameters.

741 citations

Journal ArticleDOI
TL;DR: This work introduces a decentralized scheme for least-squares and best linear unbiased estimation (BLUE) and establishes its convergence in the presence of communication noise and introduces a method of multipliers in conjunction with a block coordinate descent approach to demonstrate how the resultant algorithm can be decomposed into a set of simpler tasks suitable for distributed implementation.
Abstract: We deal with distributed estimation of deterministic vector parameters using ad hoc wireless sensor networks (WSNs). We cast the decentralized estimation problem as the solution of multiple constrained convex optimization subproblems. Using the method of multipliers in conjunction with a block coordinate descent approach we demonstrate how the resultant algorithm can be decomposed into a set of simpler tasks suitable for distributed implementation. Different from existing alternatives, our approach does not require the centralized estimator to be expressible in a separable closed form in terms of averages, thus allowing for decentralized computation even of nonlinear estimators, including maximum likelihood estimators (MLE) in nonlinear and non-Gaussian data models. We prove that these algorithms have guaranteed convergence to the desired estimator when the sensor links are assumed ideal. Furthermore, our decentralized algorithms exhibit resilience in the presence of receiver and/or quantization noise. In particular, we introduce a decentralized scheme for least-squares and best linear unbiased estimation (BLUE) and establish its convergence in the presence of communication noise. Our algorithms also exhibit potential for higher convergence rate with respect to existing schemes. Corroborating simulations demonstrate the merits of the novel distributed estimation algorithms.

740 citations

Journal ArticleDOI
TL;DR: This paper shows that global linear convergence can be guaranteed under the assumptions of strong convexity and Lipschitz gradient on one of the two functions, along with certain rank assumptions on A and B.
Abstract: The formulation $$\begin{aligned} \min _{x,y} ~f(x)+g(y),\quad \text{ subject } \text{ to } Ax+By=b, \end{aligned}$$minx,yf(x)+g(y),subjecttoAx+By=b,where f and g are extended-value convex functions, arises in many application areas such as signal processing, imaging and image processing, statistics, and machine learning either naturally or after variable splitting. In many common problems, one of the two objective functions is strictly convex and has Lipschitz continuous gradient. On this kind of problem, a very effective approach is the alternating direction method of multipliers (ADM or ADMM), which solves a sequence of f/g-decoupled subproblems. However, its effectiveness has not been matched by a provably fast rate of convergence; only sublinear rates such as O(1 / k) and $$O(1/k^2)$$O(1/k2) were recently established in the literature, though the O(1 / k) rates do not require strong convexity. This paper shows that global linear convergence can be guaranteed under the assumptions of strong convexity and Lipschitz gradient on one of the two functions, along with certain rank assumptions on A and B. The result applies to various generalizations of ADM that allow the subproblems to be solved faster and less exactly in certain manners. The derived rate of convergence also provides some theoretical guidance for optimizing the ADM parameters. In addition, this paper makes meaningful extensions to the existing global convergence theory of ADM generalizations.

734 citations


Network Information
Related Topics (5)
Partial differential equation
70.8K papers, 1.6M citations
89% related
Markov chain
51.9K papers, 1.3M citations
88% related
Optimization problem
96.4K papers, 2.1M citations
88% related
Differential equation
88K papers, 2M citations
88% related
Nonlinear system
208.1K papers, 4M citations
88% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20241
2023693
20221,530
20212,129
20202,036
20191,995