scispace - formally typeset
Search or ask a question
Topic

Rate of convergence

About: Rate of convergence is a research topic. Over the lifetime, 31257 publications have been published within this topic receiving 795334 citations. The topic is also known as: convergence rate.


Papers
More filters
Journal ArticleDOI
TL;DR: It is proved that the random consensus value is, in expectation, the average of initial node measurements and that it can be made arbitrarily close to this value in mean squared error sense, under a balanced connectivity model and by trading off convergence speed with accuracy of the computation.
Abstract: Motivated by applications to wireless sensor, peer-to-peer, and ad hoc networks, we study distributed broadcasting algorithms for exchanging information and computing in an arbitrarily connected network of nodes. Specifically, we study a broadcasting-based gossiping algorithm to compute the (possibly weighted) average of the initial measurements of the nodes at every node in the network. We show that the broadcast gossip algorithm converges almost surely to a consensus. We prove that the random consensus value is, in expectation, the average of initial node measurements and that it can be made arbitrarily close to this value in mean squared error sense, under a balanced connectivity model and by trading off convergence speed with accuracy of the computation. We provide theoretical and numerical results on the mean square error performance, on the convergence rate and study the effect of the ldquomixing parameterrdquo on the convergence rate of the broadcast gossip algorithm. The results indicate that the mean squared error strictly decreases through iterations until the consensus is achieved. Finally, we assess and compare the communication cost of the broadcast gossip algorithm to achieve a given distance to consensus through theoretical and numerical results.

516 citations

Journal ArticleDOI
TL;DR: A simple and efficient adaptive FEM for elliptic partial differential equations (PDEs) with linear rate of convergence without any preliminary mesh adaptation nor explicit knowledge of constants is constructed.
Abstract: Data oscillation is intrinsic information missed by the averaging process associated with finite element methods (FEM) regardless of quadrature. Ensuring a reduction rate of data oscillation, together with an error reduction based on a posteriori error estimators, we construct a simple and efficient adaptive FEM for elliptic partial differential equations (PDEs) with linear rate of convergence without any preliminary mesh adaptation nor explicit knowledge of constants. Any prescribed error tolerance is thus achieved in a finite number of steps. A number of numerical experiments in two and three dimensions yield quasi-optimal meshes along with a competitive performance.

515 citations

Journal ArticleDOI
TL;DR: Three novel algorithms to estimate the regression coefficients via Lasso when the training data are distributed across different agents, and their communication to a central processing unit is prohibited for e.g., communication cost or privacy reasons are developed.
Abstract: The Lasso is a popular technique for joint estimation and continuous variable selection, especially well-suited for sparse and possibly under-determined linear regression problems. This paper develops algorithms to estimate the regression coefficients via Lasso when the training data are distributed across different agents, and their communication to a central processing unit is prohibited for e.g., communication cost or privacy reasons. A motivating application is explored in the context of wireless communications, whereby sensing cognitive radios collaborate to estimate the radio-frequency power spectrum density. Attaining different tradeoffs between complexity and convergence speed, three novel algorithms are obtained after reformulating the Lasso into a separable form, which is iteratively minimized using the alternating-direction method of multipliers so as to gain the desired degree of parallelization. Interestingly, the per agent estimate updates are given by simple soft-thresholding operations, and inter-agent communication overhead remains at affordable level. Without exchanging elements from the different training sets, the local estimates consent to the global Lasso solution, i.e., the fit that would be obtained if the entire data set were centrally available. Numerical experiments with both simulated and real data demonstrate the merits of the proposed distributed schemes, corroborating their convergence and global optimality. The ideas in this paper can be easily extended for the purpose of fitting related models in a distributed fashion, including the adaptive Lasso, elastic net, fused Lasso and nonnegative garrote.

514 citations

Journal ArticleDOI
TL;DR: In this paper, the authors studied the sparsistency and rates of convergence for estimating sparse covariance and precision matrices based on penalized likelihood with nonconvex penalty functions.
Abstract: This paper studies the sparsistency and rates of convergence for estimating sparse covariance and precision matrices based on penalized likelihood with nonconvex penalty functions. Here, sparsistency refers to the property that all parameters that are zero are actually estimated as zero with probability tending to one. Depending on the case of applications, sparsity priori may occur on the covariance matrix, its inverse or its Cholesky decomposition. We study these three sparsity exploration problems under a unified framework with a general penalty function. We show that the rates of convergence for these problems under the Frobenius norm are of order (s(n) log p(n)/n)(1/2), where s(n) is the number of nonzero elements, p(n) is the size of the covariance matrix and n is the sample size. This explicitly spells out the contribution of high-dimensionality is merely of a logarithmic factor. The conditions on the rate with which the tuning parameter λ(n) goes to 0 have been made explicit and compared under different penalties. As a result, for the L(1)-penalty, to guarantee the sparsistency and optimal rate of convergence, the number of nonzero elements should be small: sn'=O(pn) at most, among O(pn2) parameters, for estimating sparse covariance or correlation matrix, sparse precision or inverse correlation matrix or sparse Cholesky factor, where sn' is the number of the nonzero elements on the off-diagonal entries. On the other hand, using the SCAD or hard-thresholding penalty functions, there is no such a restriction.

509 citations

Posted Content
TL;DR: This paper investigates the optimality of SGD in a stochastic setting, and shows that for smooth problems, the algorithm attains the optimal O(1/T) rate, however, for non-smooth problems the convergence rate with averaging might really be Ω(log(T)/T), and this is not just an artifact of the analysis.
Abstract: Stochastic gradient descent (SGD) is a simple and popular method to solve stochastic optimization problems which arise in machine learning. For strongly convex problems, its convergence rate was known to be O(\log(T)/T), by running SGD for T iterations and returning the average point. However, recent results showed that using a different algorithm, one can get an optimal O(1/T) rate. This might lead one to believe that standard SGD is suboptimal, and maybe should even be replaced as a method of choice. In this paper, we investigate the optimality of SGD in a stochastic setting. We show that for smooth problems, the algorithm attains the optimal O(1/T) rate. However, for non-smooth problems, the convergence rate with averaging might really be \Omega(\log(T)/T), and this is not just an artifact of the analysis. On the flip side, we show that a simple modification of the averaging step suffices to recover the O(1/T) rate, and no other change of the algorithm is necessary. We also present experimental results which support our findings, and point out open problems.

509 citations


Network Information
Related Topics (5)
Partial differential equation
70.8K papers, 1.6M citations
89% related
Markov chain
51.9K papers, 1.3M citations
88% related
Optimization problem
96.4K papers, 2.1M citations
88% related
Differential equation
88K papers, 2M citations
88% related
Nonlinear system
208.1K papers, 4M citations
88% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20241
2023693
20221,530
20212,129
20202,036
20191,995