scispace - formally typeset
Search or ask a question
Topic

Rate of convergence

About: Rate of convergence is a research topic. Over the lifetime, 31257 publications have been published within this topic receiving 795334 citations. The topic is also known as: convergence rate.


Papers
More filters
Posted Content
TL;DR: SignSGD can get the best of both worlds: compressed gradients and SGD-level convergence rate, and the momentum counterpart of signSGD is able to match the accuracy and convergence speed of Adam on deep Imagenet models.
Abstract: Training large neural networks requires distributing learning across multiple workers, where the cost of communicating gradients can be a significant bottleneck signSGD alleviates this problem by transmitting just the sign of each minibatch stochastic gradient We prove that it can get the best of both worlds: compressed gradients and SGD-level convergence rate The relative $\ell_1/\ell_2$ geometry of gradients, noise and curvature informs whether signSGD or SGD is theoretically better suited to a particular problem On the practical side we find that the momentum counterpart of signSGD is able to match the accuracy and convergence speed of Adam on deep Imagenet models We extend our theory to the distributed setting, where the parameter server uses majority vote to aggregate gradient signs from each worker enabling 1-bit compression of worker-server communication in both directions Using a theorem by Gauss we prove that majority vote can achieve the same reduction in variance as full precision distributed SGD Thus, there is great promise for sign-based optimisation schemes to achieve fast communication and fast convergence Code to reproduce experiments is to be found at this https URL

275 citations

Journal ArticleDOI
TL;DR: In this paper, the authors show that strong uniform times always exist and relate this method to coupling and Fourier analysis for Markov chains with strong symmetry properties, in particular random walks on finite groups.

274 citations

Journal ArticleDOI
TL;DR: In this paper, it is shown that if the regressor vector is constructed out of radial basis function approximants, it will be persistently exciting, provided a kind of "ergodic" condition is satisfied.
Abstract: In this paper, identification algorithms whose convergence and rate of convergence hinge on the regressor vector being persistently exciting are discussed. It is then shown that if the regressor vector is constructed out of radial basis function approximants, it will be persistently exciting, provided a kind of "ergodic" condition is satisfied. In addition, bounds on parameters associated with the persistently exciting regressor vector are provided; these parameters are connected with both the convergence and rates of convergence of the algorithms involved.

274 citations

Journal ArticleDOI
TL;DR: In this article, the authors give an explicit expression for the rate of convergence for fully indecomposable matrices and compare the measure with some well known alternatives, including PageRank.
Abstract: As long as a square nonnegative matrix $A$ contains sufficient nonzero elements, then the Sinkhorn-Knopp algorithm can be used to balance the matrix, that is, to find a diagonal scaling of $A$ that is doubly stochastic. It is known that the convergence is linear, and an upper bound has been given for the rate of convergence for positive matrices. In this paper we give an explicit expression for the rate of convergence for fully indecomposable matrices. We describe how balancing algorithms can be used to give a measure of web page significance. We compare the measure with some well known alternatives, including PageRank. We show that, with an appropriate modification, the Sinkhorn-Knopp algorithm is a natural candidate for computing the measure on enormous data sets.

273 citations

Journal ArticleDOI
TL;DR: In this paper, the cross-covariance function for ARCH models is studied and bounds for the crosscovarisance function are derived and explicit formulae are obtained in special cases.
Abstract: The paper studies the change-point problem and the cross-covariance function for ARCH models. Bounds for the cross-covariance function are derived and explicit formulae are obtained in special cases. Consistency of a cusum type change-point estimator is proved and its rate of convergence is established. A Hajek-Renyi type inequality is also proved. Results are obtained under weak moment assumptions.

273 citations


Network Information
Related Topics (5)
Partial differential equation
70.8K papers, 1.6M citations
89% related
Markov chain
51.9K papers, 1.3M citations
88% related
Optimization problem
96.4K papers, 2.1M citations
88% related
Differential equation
88K papers, 2M citations
88% related
Nonlinear system
208.1K papers, 4M citations
88% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20241
2023693
20221,530
20212,129
20202,036
20191,995