scispace - formally typeset
Search or ask a question

Showing papers on "Rate of convergence published in 2012"


Journal ArticleDOI
TL;DR: Surprisingly enough, for certain classes of objective functions, the proposed methods for solving huge-scale optimization problems are better than the standard worst-case bounds for deterministic algorithms.
Abstract: In this paper we propose new methods for solving huge-scale optimization problems. For problems of this size, even the simplest full-dimensional vector operations are very expensive. Hence, we propose to apply an optimization technique based on random partial update of decision variables. For these methods, we prove the global estimates for the rate of convergence. Surprisingly, for certain classes of objective functions, our results are better than the standard worst-case bounds for deterministic algorithms. We present constrained and unconstrained versions of the method and its accelerated variant. Our numerical test confirms a high efficiency of this technique on problems of very big size.

1,454 citations


Journal ArticleDOI
TL;DR: This paper considers regularized block multiconvex optimization, where the feasible set and objective function are generally nonconvex but convex in each block of variables and proposes a generalized block coordinate descent method.
Abstract: This paper considers regularized block multiconvex optimization, where the feasible set and objective function are generally nonconvex but convex in each block of variables. It also accepts nonconvex blocks and requires these blocks to be updated by proximal minimization. We review some interesting applications and propose a generalized block coordinate descent method. Under certain conditions, we show that any limit point satisfies the Nash equilibrium conditions. Furthermore, we establish global convergence and estimate the asymptotic convergence rate of the method by assuming a property based on the Kurdyka--Łojasiewicz inequality. The proposed algorithms are tested on nonnegative matrix and tensor factorization, as well as matrix and tensor recovery from incomplete observations. The tests include synthetic data and hyperspectral data, as well as image sets from the CBCL and ORL databases. Compared to the existing state-of-the-art algorithms, the proposed algorithms demonstrate superior performance in ...

1,153 citations


Journal ArticleDOI
TL;DR: This note focuses on the Douglas-Rachford ADM scheme proposed by Glowinski and Marrocco, and aims at providing a simple approach to estimating its convergence rate in terms of the iteration number.
Abstract: Alternating direction methods (ADMs) have been well studied in the literature, and they have found many efficient applications in various fields In this note, we focus on the Douglas-Rachford ADM scheme proposed by Glowinski and Marrocco, and we aim at providing a simple approach to estimating its convergence rate in terms of the iteration number The linearized version of this ADM scheme, which is known as the split inexact Uzawa method in the image processing literature, is also discussed

923 citations


Proceedings Article
03 Dec 2012
TL;DR: In this paper, a new stochastic gradient method was proposed to optimize the sum of a finite set of smooth functions, where the sum is strongly convex, with a memory of previous gradient values in order to achieve a linear convergence rate.
Abstract: We propose a new stochastic gradient method for optimizing the sum of a finite set of smooth functions, where the sum is strongly convex. While standard stochastic gradient methods converge at sublinear rates for this problem, the proposed method incorporates a memory of previous gradient values in order to achieve a linear convergence rate. In a machine learning context, numerical experiments indicate that the new algorithm can dramatically outperform standard algorithms, both in terms of optimizing the training error and reducing the test error quickly.

781 citations


Journal ArticleDOI
TL;DR: The mathematical formulation of the incremental cost consensus algorithm, which is able to solve the conventional centralized economic dispatch problem in a distributed manner, and the results of several case studies show that the difference between network topologies will influence the convergence rate of the ICC algorithm.
Abstract: In a smart grid, effective distributed control algorithms could be embedded in distributed controllers to properly allocate electrical power among connected buses autonomously. By selecting the incremental cost of each generation unit as the consensus variable, the incremental cost consensus (ICC) algorithm is able to solve the conventional centralized economic dispatch problem in a distributed manner. The mathematical formulation of the algorithm has been presented in this paper. The results of several case studies have also been presented to show that the difference between network topologies will influence the convergence rate of the ICC algorithm.

624 citations


Journal ArticleDOI
TL;DR: The accelerated stochastic approximation (AC-SA) algorithm based on Nesterov’s optimal method for smooth CP is introduced, and it is shown that the AC-SA algorithm can achieve the aforementioned lower bound on the rate of convergence for SCO.
Abstract: This paper considers an important class of convex programming (CP) problems, namely, the stochastic composite optimization (SCO), whose objective function is given by the summation of general nonsmooth and smooth stochastic components. Since SCO covers non-smooth, smooth and stochastic CP as certain special cases, a valid lower bound on the rate of convergence for solving these problems is known from the classic complexity theory of convex programming. Note however that the optimization algorithms that can achieve this lower bound had never been developed. In this paper, we show that the simple mirror-descent stochastic approximation method exhibits the best-known rate of convergence for solving these problems. Our major contribution is to introduce the accelerated stochastic approximation (AC-SA) algorithm based on Nesterov’s optimal method for smooth CP (Nesterov in Doklady AN SSSR 269:543–547, 1983; Nesterov in Math Program 103:127–152, 2005), and show that the AC-SA algorithm can achieve the aforementioned lower bound on the rate of convergence for SCO. To the best of our knowledge, it is also the first universally optimal algorithm in the literature for solving non-smooth, smooth and stochastic CP problems. We illustrate the significant advantages of the AC-SA algorithm over existing methods in the context of solving a special but broad class of stochastic programming problems.

531 citations


Proceedings Article
26 Jun 2012
TL;DR: In this article, the optimality of SGD in a stochastic setting was investigated, and it was shown that SGD attains the optimal O(1/T) rate for smooth problems.
Abstract: Stochastic gradient descent (SGD) is a simple and popular method to solve stochastic optimization problems which arise in machine learning. For strongly convex problems, its convergence rate was known to be O(log(T)/T), by running SGD for T iterations and returning the average point. However, recent results showed that using a different algorithm, one can get an optimal O(1/T) rate. This might lead one to believe that standard SGD is suboptimal, and maybe should even be replaced as a method of choice. In this paper, we investigate the optimality of SGD in a stochastic setting. We show that for smooth problems, the algorithm attains the optimal O(1/T) rate. However, for non-smooth problems, the convergence rate with averaging might really be Ω(log(T)/T), and this is not just an artifact of the analysis. On the flip side, we show that a simple modification of the averaging step suffices to recover the O(1/T) rate, and no other change of the algorithm is necessary. We also present experimental results which support our findings, and point out open problems.

471 citations


Journal ArticleDOI
TL;DR: A new efficient NeNMF solver is presented that applies Nesterov's optimal gradient method to alternatively optimize one factor with another fixed and can be used to solve -norm, -norm and manifold regularized NMF with the optimal convergence rate.
Abstract: Nonnegative matrix factorization (NMF) is a powerful matrix decomposition technique that approximates a nonnegative matrix by the product of two low-rank nonnegative matrix factors. It has been widely applied to signal processing, computer vision, and data mining. Traditional NMF solvers include the multiplicative update rule (MUR), the projected gradient method (PG), the projected nonnegative least squares (PNLS), and the active set method (AS). However, they suffer from one or some of the following three problems: slow convergence rate, numerical instability and nonconvergence. In this paper, we present a new efficient NeNMF solver to simultaneously overcome the aforementioned problems. It applies Nesterov's optimal gradient method to alternatively optimize one factor with another fixed. In particular, at each iteration round, the matrix factor is updated by using the PG method performed on a smartly chosen search point, where the step size is determined by the Lipschitz constant. Since NeNMF does not use the time consuming line search and converges optimally at rate in optimizing each matrix factor, it is superior to MUR and PG in terms of efficiency as well as approximation accuracy. Compared to PNLS and AS that suffer from numerical instability problem in the worst case, NeNMF overcomes this deficiency. In addition, NeNMF can be used to solve -norm, -norm and manifold regularized NMF with the optimal convergence rate. Numerical experiments on both synthetic and real-world datasets show the efficiency of NeNMF for NMF and its variants comparing to representative NMF solvers. Extensive experiments on document clustering suggest the effectiveness of NeNMF.

465 citations


Journal ArticleDOI
TL;DR: In this paper, the authors presented an infinitesimal-strain version of a formulation based on fast Fourier transforms (FFT) for the prediction of micromechanical fields in polycrystals deforming in the elasto-viscoplastic (EVP) regime.

441 citations


Journal ArticleDOI
TL;DR: This paper presents the Bee optimization method for harmonic elimination in a cascaded multilevel inverter, based on the food foraging behavior of a swarm of a honeybees, which has higher precision and probability of convergence than the genetic algorithm (GA).
Abstract: This paper presents the Bee optimization method for harmonic elimination in a cascaded multilevel inverter. The main objective in selective harmonic elimination pulsewidth modulation strategy is eliminating low-order harmonics by solving nonlinear equations, while the fundamental component is satisfied. In this paper, the Bee algorithm (BA) is applied to a 7-level inverter for solving the equations. The algorithm is based on the food foraging behavior of a swarm of a honeybees and it performs a neighborhood search combined with a random search. This method has higher precision and probability of convergence than the genetic algorithm (GA). MATLAB software is used for optimization and comparison of GA and BA. Simulation results show superiority of BA over GA in attaining accurate global minima and higher convergence rate. Also, its performance in 10 times run is the same as in 1 time run. Finally, for verifying purposes, an experimental study is performed.

337 citations


Journal ArticleDOI
TL;DR: It is proved that independently of the structure of the convex nonsmooth function involved, and of the given fast first order iterative scheme, it is always possible to improve the complexity rate and reach an O(\varepsilon^{-1}) efficiency estimate by solving an adequately smoothed approximation counterpart.
Abstract: We propose a unifying framework that combines smoothing approximation with fast first order algorithms for solving nonsmooth convex minimization problems. We prove that independently of the structure of the convex nonsmooth function involved, and of the given fast first order iterative scheme, it is always possible to improve the complexity rate and reach an $O(\varepsilon^{-1})$ efficiency estimate by solving an adequately smoothed approximation counterpart. Our approach relies on the combination of the notion of smoothable functions that we introduce with a natural extension of the Moreau-infimal convolution technique along with its connection to the smoothing mechanism via asymptotic functions. This allows for clarification and unification of several issues on the design, analysis, and potential applications of smoothing methods when combined with fast first order algorithms.

Posted Content
TL;DR: A new averaging technique for the projected stochastic subgradient method is presented, using a weighted average with a weight of t+1 for each iterate w_t at iteration t to obtain the convergence rate of O(1/t) with both an easy proof and an easy implementation.
Abstract: In this note, we present a new averaging technique for the projected stochastic subgradient method. By using a weighted average with a weight of t+1 for each iterate w_t at iteration t, we obtain the convergence rate of O(1/t) with both an easy proof and an easy implementation. The new scheme is compared empirically to existing techniques, with similar performance behavior.

Journal ArticleDOI
TL;DR: A Newton-based extremum seeking algorithm for the multivariable case that allows all the parameters to converge with the same speed, yielding straight trajectories to the extremum even with maps that have highly elongated level sets, in contrast to curved ''steepest descent'' trajectories of the gradient algorithm.

Journal ArticleDOI
TL;DR: A new greedy algorithm which is called the orthogonal super greedy algorithm (OSGA), called OSGA, is built and it is observed that OSGA is times simpler (more efficient) than OMP.
Abstract: The general theory of greedy approximation is well developed. Much less is known about how specific features of a dictionary can be used to our advantage. In this paper, we discuss incoherent dictionaries. We build a new greedy algorithm which is called the orthogonal super greedy algorithm (OSGA). We show that the rates of convergence of OSGA and the orthogonal matching pursuit (OMP) with respect to incoherent dictionaries are the same. Based on the analysis of the number of orthogonal projections and the number of iterations, we observed that OSGA is times simpler (more efficient) than OMP. Greedy approximation is also a fundamental tool for sparse signal recovery. The performance of orthogonal multimatching pursuit, a counterpart of OSGA in the compressed sensing setting, is also analyzed under restricted isometry property conditions.

Journal ArticleDOI
TL;DR: In this paper, it was shown that the posterior distribution of a parameter in misspecified LAN parametric models can be approximated by a random normal distribution, and that Bayesian credible sets are not valid confidence sets if the model is missing.
Abstract: We prove that the posterior distribution of a parameter in misspecified LAN parametric models can be approximated by a random normal distribution. We derive from this that Bayesian credible sets are not valid confidence sets if the model is misspecified. We obtain the result under conditions that are comparable to those in the well-specified situation: uniform testability against fixed alternatives and sufficient prior mass in neighbourhoods of the point of convergence. The rate of convergence is considered in detail, with special attention for the existence and construction of suitable test sequences. We also give a lemma to exclude testable model subsets which implies a misspecified version of Schwartz’ consistency theorem, establishing weak convergence of the posterior to a measure degenerate at the point at minimal Kullback-Leibler divergence with respect to the true distribution.

Journal ArticleDOI
TL;DR: In this paper, a rate sharp minimax lower bound for estimating sparse covariance matrices under a range of matrix operator norm and Bregman divergence losses was derived, and a thresholding estimator was shown to attain the optimal rate of convergence under the spectral norm.
Abstract: This paper considers estimation of sparse covariance matrices and establishes the optimal rate of convergence under a range of matrix operator norm and Bregman divergence losses. A major focus is on the derivation of a rate sharp minimax lower bound. The problem exhibits new features that are significantly different from those that occur in the conventional nonparametric function estimation problems. Standard techniques fail to yield good results, and new tools are thus needed. We first develop a lower bound technique that is particularly well suited for treating “two-directional” problems such as estimating sparse covariance matrices. The result can be viewed as a generalization of Le Cam’s method in one direction and Assouad’s Lemma in another. This lower bound technique is of independent interest and can be used for other matrix estimation problems. We then establish a rate sharp minimax lower bound for estimating sparse covariance matrices under the spectral norm by applying the general lower bound technique. A thresholding estimator is shown to attain the optimal rate of convergence under the spectral norm. The results are then extended to

Journal ArticleDOI
TL;DR: An algebraic multigrid method is presented which has a guaranteed convergence rate for the class of nonsingular symmetric M-matrices with nonnegative row sum and is analytically shown to hold for the model Poisson problem.
Abstract: We consider the iterative solution of large sparse symmetric positive definite linear systems. We present an algebraic multigrid method which has a guaranteed convergence rate for the class of nonsingular symmetric M-matrices with nonnegative row sum. The coarsening is based on the aggregation of the unknowns. A key ingredient is an algorithm that builds the aggregates while ensuring that the corresponding two-grid convergence rate is bounded by a user-defined parameter. For a sensible choice of this parameter, it is shown that the recursive use of the two-grid procedure yields a convergence independent of the number of levels, provided that one uses a proper AMLI-cycle. On the other hand, the computational cost per iteration step is of optimal order if the mean aggregate size is large enough. This cannot be guaranteed in all cases but is analytically shown to hold for the model Poisson problem. For more general problems, a wide range of experiments suggests that there are no complexity issues and further demonstrates the robustness of the method. The experiments are performed on systems obtained from low order finite difference or finite element discretizations of second order elliptic partial differential equations (PDEs). The set includes two- and three-dimensional problems, with both structured and unstructured grids, some of them with local refinement and/or reentering corner, and possible jumps or anisotropies in the PDE coefficients.

Journal ArticleDOI
TL;DR: The challenges of obtaining fast and accurate solutions of the coupled nonlinear WHAM equations, quantifying the statistical errors of the resulting free energies, of diagnosing possible systematic errors, and of optimally allocating of the computational resources are addressed.
Abstract: The weighted histogram analysis method (WHAM) has become the standard technique for the analysis of umbrella sampling simulations. In this article, we address the challenges (1) of obtaining fast and accurate solutions of the coupled nonlinear WHAM equations, (2) of quantifying the statistical errors of the resulting free energies, (3) of diagnosing possible systematic errors, and (4) of optimally allocating of the computational resources. Traditionally, the WHAM equations are solved by a fixed-point direct iteration method, despite poor convergence and possible numerical inaccuracies in the solutions. Here, we instead solve the mathematically equivalent problem of maximizing a target likelihood function, by using superlinear numerical optimization algorithms with a significantly faster convergence rate. To estimate the statistical errors in one-dimensional free energy profiles obtained from WHAM, we note that for densely spaced umbrella windows with harmonic biasing potentials, the WHAM free energy profile can be approximated by a coarse-grained free energy obtained by integrating the mean restraining forces. The statistical errors of the coarse-grained free energies can be estimated straightforwardly and then used for the WHAM results. A generalization to multidimensional WHAM is described. We also propose two simple statistical criteria to test the consistency between the histograms of adjacent umbrella windows, which help identify inadequate sampling and hysteresis in the degrees of freedom orthogonal to the reaction coordinate. Together, the estimates of the statistical errors and the diagnostics of inconsistencies in the potentials of mean force provide a basis for the efficient allocation of computational resources in free energy simulations.

Journal ArticleDOI
TL;DR: A new, easy to implement, nonparametric VSS-NLMS algorithm that employs the mean-square error and the estimated system noise power to control the step-size update and is in very good agreement with the experimental results.
Abstract: Numerous variable step-size normalized least mean-square (VSS-NLMS) algorithms have been derived to solve the dilemma of fast convergence rate or low excess mean-square error in the past two decades. This paper proposes a new, easy to implement, nonparametric VSS-NLMS algorithm that employs the mean-square error and the estimated system noise power to control the step-size update. Theoretical analysis of its steady-state behavior shows that, when the input is zero-mean Gaussian distributed, the misadjustment depends only on a parameter β controlling the update of step size. Simulation experiments show that the proposed algorithm performs very well. Furthermore, the theoretical steady-state behavior is in very good agreement with the experimental results.

Journal ArticleDOI
TL;DR: The smoothing proximal gradient (SPG) method as discussed by the authors combines a smoothing technique with an effective proximal gradients method to solve structured sparse regression problems with any smooth convex loss under a wide spectrum of structured sparsityinducing penalties.
Abstract: We study the problem of estimating high-dimensional regression models regularized by a structured sparsity-inducing penalty that encodes prior structural information on either the input or output variables. We consider two widely adopted types of penalties of this kind as motivating examples: (1) the general overlapping-group-lasso penalty, generalized from the group-lasso penalty; and (2) the graph-guided-fused-lasso penalty, generalized from the fused-lasso penalty. For both types of penalties, due to their nonseparability and nonsmoothness, developing an efficient optimization method remains a challenging problem. In this paper we propose a general optimization approach, the smoothing proximal gradient (SPG) method, which can solve structured sparse regression problems with any smooth convex loss under a wide spectrum of structured sparsity-inducing penalties. Our approach combines a smoothing technique with an effective proximal gradient method. It achieves a convergence rate significantly faster than the standard first-order methods, subgradient methods, and is much more scalable than the most widely used interior-point methods. The efficiency and scalability of our method are demonstrated on both simulation experiments and real genetic data sets.

Journal ArticleDOI
TL;DR: It is shown that Newton's method converges under weaker convergence criteria than those given in earlier studies, such as Argyros (2004) and Hilout (2010), which is often used for solving nonlinear equations.

Journal ArticleDOI
TL;DR: In this paper, a polarization-based iterative scheme was proposed for computing the macroscopic properties of elastic composites with an arbitrary contrast, which is nearly as simple as the basic schemes (strain and stress based) but which has the ability to compute the overall properties of multiphase composites.
Abstract: It is recognized that the convergence of FFT based iterative schemes used for computing the effective properties of elastic composite materials drastically depends on the contrast between the phases. Particularly, the rate of convergence of the strain based iterative scheme strongly decreases when the composites contain very stiff inclusions and the method diverges in the case of rigid inclusions. Reversely, the stress based iterative scheme converges rapidly in the case of composites with very stiff or rigid inclusions, but leads to low convergence rates when soft inclusions are considered and to divergence for composites containing voids. It follows that the computation of effective properties is costly when the heterogeneous medium contains simultaneously soft and stiff phases. Particularly the problem of composites containing voids and rigid inclusions cannot be solved by the strain or the stress based approaches. In this paper, we propose a new polarization-based iterative scheme for computing the macroscopic properties of elastic composites with an arbitrary contrast which is nearly as simple as the basic schemes (strain and stress based) but which has the ability to compute the overall properties of multiphase composites with arbitrary elastic moduli, as illustrated through several examples.

Journal ArticleDOI
TL;DR: It is proved that the QPSO algorithm is a form of contraction mapping and can converge to the global optimum and a new definition for the convergence rate of a stochastic algorithm as well as definitions for three types of convergence according to the correlations between the convergence rates and the objective function values are provided.

Journal ArticleDOI
TL;DR: This work considers the problem of numerically approximating the solution of an elliptic partial differential equation with random coefficients and homogeneous Dirichlet boundary conditions, and focuses on the case of a lognormal coefficient, obtaining a weak rate of convergence which is twice the strong one.
Abstract: We consider the problem of numerically approximating the solution of an elliptic partial differential equation with random coefficients and homogeneous Dirichlet boundary conditions. We focus on the case of a lognormal coefficient and deal with the lack of uniform coercivity and uniform boundedness with respect to the randomness. This model is frequently used in hydrogeology. We approximate this coefficient by a finite dimensional noise using a truncated Karhunen-Loeve expansion. We give estimates of the corresponding error on the solution, both a strong error estimate and a weak error estimate, that is, an estimate of the error commited on the law of the solution. We obtain a weak rate of convergence which is twice the strong one. In addition, we give a complete error estimate for the stochastic collocation method in this case, where neither coercivity nor boundedness is stochastically uniform. To conclude, we apply these results of strong and weak convergence to two classical cases of covariance kernel choices, the case of an exponential covariance kernel on a box and the case of an analytic covariance kernel, yielding explicit weak and strong convergence rates.

Journal ArticleDOI
TL;DR: Finite difference methods for the Gross-Pitaevskii equation with an angular momentum rotation term in two and three dimensions are analyzed and error bounds on the errors between the mass and energy in the discretized level and their corresponding continuous counterparts are derived.
Abstract: We analyze finite difference methods for the Gross-Pitaevskii equation with an angular momentum rotation term in two and three dimensions and obtain the optimal convergence rate, for the conservative Crank-Nicolson finite difference (CNFD) method and semi-implicit finite difference (SIFD) method, at the order of O(h2 + τ2) in the l2-norm and discrete H1-norm with time step τ and mesh size h. Besides the standard techniques of the energy method, the key technique in the analysis for the SIFD method is to use the mathematical induction, and resp., for the CNFD method is to obtain a priori bound of the numerical solution in the l∞-norm by using the inverse inequality and the l2-norm error estimate. In addition, for the SIFD method, we also derive error bounds on the errors between the mass and energy in the discretized level and their corresponding continuous counterparts, respectively, which are at the same order of the convergence rate as that of the numerical solution itself. Finally, numerical results are reported to confirm our error estimates of the numerical methods.

Proceedings ArticleDOI
27 Aug 2012
TL;DR: An extension of RRT, called RRT*-Smart, is proposed, which aims to accelerate its rate of convergence and to reach an optimum or near optimum solution at a much faster rate and at a reduced execution time.
Abstract: Rapidly Exploring Random Tree (RRT) is one of the quickest and the most efficient obstacle free path finding algorithm. However, it cannot guarantee finding the most optimal path. A recently proposed extension of RRT, known as Rapidly Exploring Random Tree Star (RRT∗), claims to achieve convergence towards the optimal solution but has been proven to take an infinite time to do so and with a slow convergence rate. To overcome these limitations, we propose an extension of RRT∗, called RRT∗-Smart, which aims to accelerate its rate of convergence and to reach an optimum or near optimum solution at a much faster rate and at a reduced execution time. Our novel algorithm inculcates two new techniques in RRT∗: these are path optimization and intelligent sampling. Simulation results presented in various obstacle cluttered environments confirm the efficiency of RRT∗-Smart.

Journal ArticleDOI
TL;DR: An approach in which a guarantee is given on the convergence rate thanks to an aggregation algorithm that allows an explicit control of the location of the eigenvalues of the preconditioned matrix is developed.
Abstract: We consider the iterative solution of large sparse linear systems arising from the upwind finite difference discretization of convection-diffusion equations. The system matrix is then an M-matrix with nonnegative row sum, and, further, when the convective flow has zero divergence, the column sum is also nonnegative, possibly up to a small correction term. We investigate aggregation-based algebraic multigrid methods for this class of matrices. A theoretical analysis is developed for a simplified two-grid scheme with one damped Jacobi postsmoothing step. An uncommon feature of this analysis is that it applies directly to problems with variable coefficients; e.g., to problems with recirculating convective flow. On the basis of this theory, we develop an approach in which a guarantee is given on the convergence rate thanks to an aggregation algorithm that allows an explicit control of the location of the eigenvalues of the preconditioned matrix. Some issues that remain beyond the analysis are discussed in the...

Journal ArticleDOI
TL;DR: This paper presents comprehensive theoretical performance analysis of I0-LMS for white Gaussian input data based on some reasonable assumptions, which are reasonable in a large range of parameter setting.
Abstract: As one of the recently proposed algorithms for sparse system identification, I0 norm constraint Least Mean Square (io-LMS) algorithm modifies the cost function of the traditional method with a penalty of tap-weight sparsity. The performance of I0-LMS is quite attractive compared with its various precursors. However, there has been no detailed study of its performance. This paper presents comprehensive theoretical performance analysis of I0-LMS for white Gaussian input data based on some reasonable assumptions, which are reasonable in a large range of parameter setting. Expressions for steady-state mean square deviation (MSD) are derived and discussed with respect to algorithm parameters and system sparsity. The parameter selection rule is established for achieving the best performance. Approximated with Taylor series, the instantaneous behavior is also derived. In addition, the relationship between I0-LMS and some previous arts and the sufficient conditions for I0-LMS to accelerate convergence are set up. Finally, all of the theoretical results are compared with simulations and are shown to agree well in a wide range of parameters.

Posted Content
TL;DR: A heuristic analysis is provided that suggests that in many cases adaptively restarting allows us to recover the optimal rate of convergence with no prior knowledge of function parameters.
Abstract: In this paper we demonstrate a simple heuristic adaptive restart technique that can dramatically improve the convergence rate of accelerated gradient schemes. The analysis of the technique relies on the observation that these schemes exhibit two modes of behavior depending on how much momentum is applied. In what we refer to as the 'high momentum' regime the iterates generated by an accelerated gradient scheme exhibit a periodic behavior, where the period is proportional to the square root of the local condition number of the objective function. This suggests a restart technique whereby we reset the momentum whenever we observe periodic behavior. We provide analysis to show that in many cases adaptively restarting allows us to recover the optimal rate of convergence with no prior knowledge of function parameters.

Journal ArticleDOI
TL;DR: In this paper, the authors derived an averaged equation for a class of stochastic partial differential equations without any Lipschitz assumption on the slow modes and derived the rate of convergence in probability as a byproduct.