scispace - formally typeset
Search or ask a question
Topic

Rate of convergence

About: Rate of convergence is a research topic. Over the lifetime, 31257 publications have been published within this topic receiving 795334 citations. The topic is also known as: convergence rate.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper studies numerical convergence, consistency and statistical rates of convergence of boosting with early stopping, when it is carried out over the linear span of a family of basis functions, and leads to a rigorous proof that for a linearly separable problem, AdaBoost becomes an L 1 -margin maximizer when left to run to convergence.
Abstract: Boosting is one of the most significant advances in machine learning for classification and regression. In its original and computationally flexible version, boosting seeks to minimize empirically a loss function in a greedy fashion. The resulting estimator takes an additive function form and is built iteratively by applying a base estimator (or learner) to updated samples depending on the previous iterations. An unusual regularization technique, early stopping, is employed based on CV or a test set. This paper studies numerical convergence, consistency and statistical rates of convergence of boosting with early stopping, when it is carried out over the linear span of a family of basis functions. For general loss functions, we prove the convergence of boosting's greedy optimization to the infinimum of the loss function over the linear span. Using the numerical convergence result, we find early-stopping strategies under which boosting is shown to be consistent based on i.i.d. samples, and we obtain bounds on the rates of convergence for boosting estimators. Simulation studies are also presented to illustrate the relevance of our theoretical results for providing insights to practical aspects of boosting. As a side product, these results also reveal the importance of restricting the greedy search step-sizes. as known in practice through the work of Friedman and others. Moreover, our results lead to a rigorous proof that for a linearly separable problem, AdaBoost with E → 0 step-size becomes an L 1 -margin maximizer when left to run to convergence.

451 citations

Journal ArticleDOI
TL;DR: Exact computable rates of convergence for Gaussian target distributions are obtained and different random and non‐random updating strategies and blocking combinations are compared using the rates.
Abstract: In this paper many convergence issues concerning the implementation of the Gibbs sampler are investigated. Exact computable rates of convergence for Gaussian target distributions are obtained. Different random and non-random updating strategies and blocking combinations are compared using the rates. The effect of dimensionality and correlation structure on the convergence rates are studied. Some examples are considered to demonstrate the results. For a Gaussian image analysis problem several updating strategies are described and compared. For problems in Bayesian linear models several possible parameterizations are analysed in terms of their convergence rates characterizing the optimal choice.

448 citations

Book ChapterDOI
01 Dec 2004
TL;DR: This paper derives convergence rates for Q-learning from a polynomial learning rate and shows that for a linear learning rate, one which is 1/t at time t, the convergence rate has an exponential dependence on 1/(1-γ).
Abstract: In this paper we derive convergence rates for Q-learning. We show an interesting relationship between the convergence rate and the learning rate used in Q-learning. For a polynomial learning rate, one which is 1/tω at time t where ω∈(1/2,1), we show that the convergence rate is polynomial in 1/(1-γ), where γ is the discount factor. In contrast we show that for a linear learning rate, one which is 1/t at time t, the convergence rate has an exponential dependence on 1/(1-γ). In addition we show a simple example that proves this exponential behavior is inherent for linear learning rates.

446 citations

Journal ArticleDOI
TL;DR: This paper analyzes these new methods for symmetric positive definite problems and shows their relation to other modern domain decomposition methods like the new Finite Element Tearing and Interconnect (FETI) variants.
Abstract: Optimized Schwarz methods are a new class of Schwarz methods with greatly enhanced convergence properties. They converge uniformly faster than classical Schwarz methods and their convergence rates dare asymptotically much better than the convergence rates of classical Schwarz methods if the overlap is of the order of the mesh parameter, which is often the case in practical applications. They achieve this performance by using new transmission conditions between subdomains which greatly enhance the information exchange between subdomains and are motivated by the physics of the underlying problem. We analyze in this paper these new methods for symmetric positive definite problems and show their relation to other modern domain decomposition methods like the new Finite Element Tearing and Interconnect (FETI) variants.

446 citations

Proceedings Article
06 Dec 2010
TL;DR: Singular value projection (SVP) as discussed by the authors is a simple and fast algorithm for rank minimization under affine constraints (ARMP) and shows that SVP recovers the minimum rank solution for affine constraint that satisfy a restricted isometry property (RIP).
Abstract: Minimizing the rank of a matrix subject to affine constraints is a fundamental problem with many important applications in machine learning and statistics. In this paper we propose a simple and fast algorithm SVP (Singular Value Projection) for rank minimization under affine constraints (ARMP) and show that SVP recovers the minimum rank solution for affine constraints that satisfy a restricted isometry property (RIP). Our method guarantees geometric convergence rate even in the presence of noise and requires strictly weaker assumptions on the RIP constants than the existing methods. We also introduce a Newton-step for our SVP framework to speed-up the convergence with substantial empirical gains. Next, we address a practically important application of ARMP - the problem of low-rank matrix completion, for which the defining affine constraints do not directly obey RIP, hence the guarantees of SVP do not hold. However, we provide partial progress towards a proof of exact recovery for our algorithm by showing a more restricted isometry property and observe empirically that our algorithm recovers low-rank incoherent matrices from an almost optimal number of uniformly sampled entries. We also demonstrate empirically that our algorithms outperform existing methods, such as those of [5, 18, 14], for ARMP and the matrix completion problem by an order of magnitude and are also more robust to noise and sampling schemes. In particular, results show that our SVP-Newton method is significantly robust to noise and performs impressively on a more realistic power-law sampling scheme for the matrix completion problem.

445 citations


Network Information
Related Topics (5)
Partial differential equation
70.8K papers, 1.6M citations
89% related
Markov chain
51.9K papers, 1.3M citations
88% related
Optimization problem
96.4K papers, 2.1M citations
88% related
Differential equation
88K papers, 2M citations
88% related
Nonlinear system
208.1K papers, 4M citations
88% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20241
2023693
20221,530
20212,129
20202,036
20191,995