scispace - formally typeset
Search or ask a question

Showing papers on "Rate of convergence published in 2009"


Journal ArticleDOI
TL;DR: A new fast iterative shrinkage-thresholding algorithm (FISTA) which preserves the computational simplicity of ISTA but with a global rate of convergence which is proven to be significantly better, both theoretically and practically.
Abstract: We consider the class of iterative shrinkage-thresholding algorithms (ISTA) for solving linear inverse problems arising in signal/image processing. This class of methods, which can be viewed as an extension of the classical gradient algorithm, is attractive due to its simplicity and thus is adequate for solving large-scale problems even with dense matrix data. However, such methods are also known to converge quite slowly. In this paper we present a new fast iterative shrinkage-thresholding algorithm (FISTA) which preserves the computational simplicity of ISTA but with a global rate of convergence which is proven to be significantly better, both theoretically and practically. Initial promising numerical results for wavelet-based image deblurring demonstrate the capabilities of FISTA which is shown to be faster than ISTA by several orders of magnitude.

11,413 citations


Journal ArticleDOI
TL;DR: The authors' convergence rate results explicitly characterize the tradeoff between a desired accuracy of the generated approximate optimal solutions and the number of iterations needed to achieve the accuracy.
Abstract: We study a distributed computation model for optimizing a sum of convex objective functions corresponding to multiple agents. For solving this (not necessarily smooth) optimization problem, we consider a subgradient method that is distributed among the agents. The method involves every agent minimizing his/her own objective function while exchanging information locally with other agents in the network over a time-varying topology. We provide convergence results and convergence rate estimates for the subgradient method. Our convergence rate results explicitly characterize the tradeoff between a desired accuracy of the generated approximate optimal solutions and the number of iterations needed to achieve the accuracy.

3,238 citations


Journal ArticleDOI
TL;DR: A fast algorithm is derived for the constrained TV-based image deblurring problem with box constraints by combining an acceleration of the well known dual approach to the denoising problem with a novel monotone version of a fast iterative shrinkage/thresholding algorithm (FISTA).
Abstract: This paper studies gradient-based schemes for image denoising and deblurring problems based on the discretized total variation (TV) minimization model with constraints. We derive a fast algorithm for the constrained TV-based image deburring problem. To achieve this task, we combine an acceleration of the well known dual approach to the denoising problem with a novel monotone version of a fast iterative shrinkage/thresholding algorithm (FISTA) we have recently introduced. The resulting gradient-based algorithm shares a remarkable simplicity together with a proven global rate of convergence which is significantly better than currently known gradient projections-based methods. Our results are applicable to both the anisotropic and isotropic discretized TV functionals. Initial numerical results demonstrate the viability and efficiency of the proposed algorithms on image deblurring problems with box constraints.

1,981 citations


Journal ArticleDOI
TL;DR: In this paper, a randomized version of the Kaczmarz method for consistent, overdetermined linear systems is introduced and it is shown that it converges with expected exponential rate.
Abstract: The Kaczmarz method for solving linear systems of equations is an iterative algorithm that has found many applications ranging from computer tomography to digital signal processing. Despite the popularity of this method, useful theoretical estimates for its rate of convergence are still scarce. We introduce a randomized version of the Kaczmarz method for consistent, overdetermined linear systems and we prove that it converges with expected exponential rate. Furthermore, this is the first solver whose rate does not depend on the number of equations in the system. The solver does not even need to know the whole system but only a small random part of it. It thus outperforms all previously known methods on general extremely overdetermined systems. Even for moderately overdetermined systems, numerical simulations as well as theoretical analysis reveal that our algorithm can converge faster than the celebrated conjugate gradient algorithm. Furthermore, our theory and numerical simulations confirm a prediction of Feichtinger et al. in the context of reconstructing bandlimited functions from nonuniform sampling.

768 citations


Journal ArticleDOI
TL;DR: It is shown that A-ND represents the best of both worlds-zero bias and low variance-at the cost of a slow convergence rate; rescaling the weights balances the variance versus the rate of bias reduction (convergence rate).
Abstract: The paper studies average consensus with random topologies (intermittent links) and noisy channels. Consensus with noise in the network links leads to the bias-variance dilemma-running consensus for long reduces the bias of the final average estimate but increases its variance. We present two different compromises to this tradeoff: the A-ND algorithm modifies conventional consensus by forcing the weights to satisfy a persistence condition (slowly decaying to zero;) and the A-NC algorithm where the weights are constant but consensus is run for a fixed number of iterations [^(iota)], then it is restarted and rerun for a total of [^(p)] runs, and at the end averages the final states of the [^(p)] runs (Monte Carlo averaging). We use controlled Markov processes and stochastic approximation arguments to prove almost sure convergence of A-ND to a finite consensus limit and compute explicitly the mean square error (mse) (variance) of the consensus limit. We show that A-ND represents the best of both worlds-zero bias and low variance-at the cost of a slow convergence rate; rescaling the weights balances the variance versus the rate of bias reduction (convergence rate). In contrast, A-NC, because of its constant weights, converges fast but presents a different bias-variance tradeoff. For the same number of iterations [^(iota)][^(p)] , shorter runs (smaller [^(iota)] ) lead to high bias but smaller variance (larger number [^(p)] of runs to average over.) For a static nonrandom network with Gaussian noise, we compute the optimal gain for A-NC to reach in the shortest number of iterations [^(iota)][^(p)] , with high probability (1-delta), (epsiv, delta)-consensus (epsiv residual bias). Our results hold under fairly general assumptions on the random link failures and communication noise.

687 citations


Proceedings ArticleDOI
19 Apr 2009
TL;DR: A new approach to adaptive system identification when the system model is sparse is proposed, which results in a zero-attracting LMS and a reweighted zero attractor, and it is proved that the ZA-LMS can achieve lower mean square error than the standard LMS.
Abstract: We propose a new approach to adaptive system identification when the system model is sparse. The approach applies l 1 relaxation, common in compressive sensing, to improve the performance of LMS-type adaptive methods. This results in two new algorithms, the zero-attracting LMS (ZA-LMS) and the reweighted zero-attracting LMS (RZA-LMS). The ZA-LMS is derived via combining a l 1 norm penalty on the coefficients into the quadratic LMS cost function, which generates a zero attractor in the LMS iteration. The zero attractor promotes sparsity in taps during the filtering process, and therefore accelerates convergence when identifying sparse systems. We prove that the ZA-LMS can achieve lower mean square error than the standard LMS. To further improve the filtering performance, the RZA-LMS is developed using a reweighted zero attractor. The performance of the RZA-LMS is superior to that of the ZA-LMS numerically. Experiments demonstrate the advantages of the proposed filters in both convergence rate and steady-state behavior under sparsity assumptions on the true coefficient vector. The RZA-LMS is also shown to be robust when the number of non-zero taps increases.

681 citations


Proceedings ArticleDOI
14 Jun 2009
TL;DR: This paper exploits the special structure of the trace norm, based on which it is proposed an extended gradient algorithm that converges as O(1/k) and proposes an accelerated gradient algorithm, which achieves the optimal convergence rate of O( 1/k2) for smooth problems.
Abstract: We consider the minimization of a smooth loss function regularized by the trace norm of the matrix variable. Such formulation finds applications in many machine learning tasks including multi-task learning, matrix classification, and matrix completion. The standard semidefinite programming formulation for this problem is computationally expensive. In addition, due to the non-smooth nature of the trace norm, the optimal first-order black-box method for solving such class of problems converges as O(1/√k), where k is the iteration counter. In this paper, we exploit the special structure of the trace norm, based on which we propose an extended gradient algorithm that converges as O(1/k). We further propose an accelerated gradient algorithm, which achieves the optimal convergence rate of O(1/k2) for smooth problems. Experiments on multi-task learning problems demonstrate the efficiency of the proposed algorithms.

600 citations


Journal ArticleDOI
TL;DR: In this article, a generalized pre-averaging approach for estimating the integrated volatility is presented, which can generate rate optimal estimators with convergence rate n 1/4. But the convergence rate is not guaranteed.

525 citations


Journal ArticleDOI
TL;DR: It is proved that the random consensus value is, in expectation, the average of initial node measurements and that it can be made arbitrarily close to this value in mean squared error sense, under a balanced connectivity model and by trading off convergence speed with accuracy of the computation.
Abstract: Motivated by applications to wireless sensor, peer-to-peer, and ad hoc networks, we study distributed broadcasting algorithms for exchanging information and computing in an arbitrarily connected network of nodes. Specifically, we study a broadcasting-based gossiping algorithm to compute the (possibly weighted) average of the initial measurements of the nodes at every node in the network. We show that the broadcast gossip algorithm converges almost surely to a consensus. We prove that the random consensus value is, in expectation, the average of initial node measurements and that it can be made arbitrarily close to this value in mean squared error sense, under a balanced connectivity model and by trading off convergence speed with accuracy of the computation. We provide theoretical and numerical results on the mean square error performance, on the convergence rate and study the effect of the ldquomixing parameterrdquo on the convergence rate of the broadcast gossip algorithm. The results indicate that the mean squared error strictly decreases through iterations until the consensus is achieved. Finally, we assess and compare the communication cost of the broadcast gossip algorithm to achieve a given distance to consensus through theoretical and numerical results.

516 citations


Journal ArticleDOI
TL;DR: In this paper, the authors studied the sparsistency and rates of convergence for estimating sparse covariance and precision matrices based on penalized likelihood with nonconvex penalty functions.
Abstract: This paper studies the sparsistency and rates of convergence for estimating sparse covariance and precision matrices based on penalized likelihood with nonconvex penalty functions. Here, sparsistency refers to the property that all parameters that are zero are actually estimated as zero with probability tending to one. Depending on the case of applications, sparsity priori may occur on the covariance matrix, its inverse or its Cholesky decomposition. We study these three sparsity exploration problems under a unified framework with a general penalty function. We show that the rates of convergence for these problems under the Frobenius norm are of order (s(n) log p(n)/n)(1/2), where s(n) is the number of nonzero elements, p(n) is the size of the covariance matrix and n is the sample size. This explicitly spells out the contribution of high-dimensionality is merely of a logarithmic factor. The conditions on the rate with which the tuning parameter λ(n) goes to 0 have been made explicit and compared under different penalties. As a result, for the L(1)-penalty, to guarantee the sparsistency and optimal rate of convergence, the number of nonzero elements should be small: sn'=O(pn) at most, among O(pn2) parameters, for estimating sparse covariance or correlation matrix, sparse precision or inverse correlation matrix or sparse Cholesky factor, where sn' is the number of the nonzero elements on the off-diagonal entries. On the other hand, using the SCAD or hard-thresholding penalty functions, there is no such a restriction.

509 citations


Journal Article
TL;DR: This paper proposes a new importance estimation method that has a closed-form solution; the leave-one-out cross-validation score can also be computed analytically and is computationally highly efficient and simple to implement.
Abstract: We address the problem of estimating the ratio of two probability density functions, which is often referred to as the importance. The importance values can be used for various succeeding tasks such as covariate shift adaptation or outlier detection. In this paper, we propose a new importance estimation method that has a closed-form solution; the leave-one-out cross-validation score can also be computed analytically. Therefore, the proposed method is computationally highly efficient and simple to implement. We also elucidate theoretical properties of the proposed method such as the convergence rate and approximation error bounds. Numerical experiments show that the proposed method is comparable to the best existing method in accuracy, while it is computationally more efficient than competing approaches.

Journal ArticleDOI
TL;DR: The proposed methodology may allow the upgrading of an existing evaluation to incorporate the genomic information when the information attributable to genomics can be expressed as modifications to the numerator relationship matrix.

Journal ArticleDOI
TL;DR: It is shown that the speed of convergence of the k-NN method can be further improved by an adaptive choice of k.i.d., and the new universal estimator of divergence is proved to be asymptotically unbiased and mean-square consistent.
Abstract: A new universal estimator of divergence is presented for multidimensional continuous densities based on k-nearest-neighbor (k-NN) distances. Assuming independent and identically distributed (i.i.d.) samples, the new estimator is proved to be asymptotically unbiased and mean-square consistent. In experiments with high-dimensional data, the k-NN approach generally exhibits faster convergence than previous algorithms. It is also shown that the speed of convergence of the k-NN method can be further improved by an adaptive choice of k.

Journal ArticleDOI
TL;DR: In this article, a general approximation approach on l 0 norm, a typical metric of system sparsity, is proposed and integrated into the cost function of the LMS algorithm, which is equivalent to add a zero attractor in the iterations, by which the convergence rate of small coefficients, that dominate the sparse system, can be effectively improved.
Abstract: In order to improve the performance of least mean square (LMS) based system identification of sparse systems, a new adaptive algorithm is proposed which utilizes the sparsity property of such systems. A general approximating approach on l 0 norm-a typical metric of system sparsity, is proposed and integrated into the cost function of the LMS algorithm. This integration is equivalent to add a zero attractor in the iterations, by which the convergence rate of small coefficients, that dominate the sparse system, can be effectively improved. Moreover, using partial updating method, the computational complexity is reduced. The simulations demonstrate that the proposed algorithm can effectively improve the performance of LMS-based identification algorithms on sparse system.

Journal ArticleDOI
TL;DR: The alternating minimization algorithm is extended to the case of recovering blurry multichannel (color) images corrupted by impulsive rather than Gaussian noise and proves attractive convergence properties, including finite convergence for some variables and $q$-linear convergence rate.
Abstract: We extend the alternating minimization algorithm recently proposed in [Y. Wang, J. Yang, W. Yin, and Y. Zhang, SIAM J. Imag. Sci., 1 (2008), pp. 248-272]; [J. Yang, W. Yin, Y. Zhang, and Y. Wang, SIAM J. Imag. Sci., 2 (2009), pp. 569-592] to the case of recovering blurry multichannel (color) images corrupted by impulsive rather than Gaussian noise. The algorithm minimizes the sum of a multichannel extension of total variation and a data fidelity term measured in the $\ell_1$-norm, and is applicable to both salt-and-pepper and random-valued impulsive noise. We derive the algorithm by applying the well-known quadratic penalty function technique and prove attractive convergence properties, including finite convergence for some variables and $q$-linear convergence rate. Under periodic boundary conditions, the main computational requirements of the algorithm are fast Fourier transforms and a low-complexity Gaussian elimination procedure. Numerical results on images with different blurs and impulsive noise are presented to demonstrate the efficiency of the algorithm. In addition, it is numerically compared to the least absolute deviation method [H. Y. Fu, M. K. Ng, M. Nikolova, and J. L. Barlow, SIAM J. Sci. Comput., 27 (2006), pp. 1881-1902] and the two-phase method [J. F. Cai, R. Chan, and M. Nikolova, AIMS J. Inverse Problems and Imaging, 2 (2008), pp. 187-204] for recovering grayscale images. We also present results of recovering multichannel images.

Journal ArticleDOI
TL;DR: It is shown that the CVT energy function has 2nd order smoothness for convex domains with smooth density, as well as in most situations encountered in optimization, due to the Newton-like optimization methods.
Abstract: Centroidal Voronoi tessellation (CVT) is a particular type of Voronoi tessellation that has many applications in computational sciences and engineering, including computer graphics. The prevailing method for computing CVT is Lloyd's method, which has linear convergence and is inefficient in practice. We develop new efficient methods for CVT computation and demonstrate the fast convergence of these methods. Specifically, we show that the CVT energy function has 2nd order smoothness for convex domains with smooth density, as well as in most situations encountered in optimization. Due to the 2nd order smoothness, it is possible to minimize the CVT energy functions using Newton-like optimization methods and expect fast convergence. We propose a quasi-Newton method to compute CVT and demonstrate its faster convergence than Lloyd's method with various numerical examples. It is also significantly faster and more robust than the Lloyd-Newton method, a previous attempt to accelerate CVT. We also demonstrate surface remeshing as a possible application.

Journal ArticleDOI
TL;DR: In this paper, a generalized polynomial chaos (gPC) based forward solver for inverse problems is proposed. But the convergence rate of the forward solution is not known.
Abstract: We present an efficient numerical strategy for the Bayesian solution of inverse problems. Stochastic collocation methods, based on generalized polynomial chaos (gPC), are used to construct a polynomial approximation of the forward solution over the support of the prior distribution. This approximation then defines a surrogate posterior probability density that can be evaluated repeatedly at minimal computational cost. The ability to simulate a large number of samples from the posterior distribution results in very accurate estimates of the inverse solution and its associated uncertainty. Combined with high accuracy of the gPC-based forward solver, the new algorithm can provide great efficiency in practical applications. A rigorous error analysis of the algorithm is conducted, where we establish convergence of the approximate posterior to the true posterior and obtain an estimate of the convergence rate. It is proved that fast (exponential) convergence of the gPC forward solution yields similarly fast (exponential) convergence of the posterior. The numerical strategy and the predicted convergence rates are then demonstrated on nonlinear inverse problems of varying smoothness and dimension. AMS subject classifications: 41A10, 60H35, 65C30, 65C50

Journal ArticleDOI
TL;DR: In this paper, the rate of convergence of the microscopic quantum mechanical evolution towards the limiting Hartree dynamics was studied and bounds on the difference between the one-particle density associated with the solution of the N-body Schrodinger equation and the orthogonal projection onto the Hartree equation were established.
Abstract: The nonlinear Hartree equation describes the macroscopic dynamics of initially factorized N-boson states, in the limit of large N. In this paper we provide estimates on the rate of convergence of the microscopic quantum mechanical evolution towards the limiting Hartree dynamics. More precisely, we prove bounds on the difference between the one-particle density associated with the solution of the N-body Schrodinger equation and the orthogonal projection onto the solution of the Hartree equation.

Journal ArticleDOI
TL;DR: An extension of Newton's method for unconstrained multiobjective optimization (multicriteria optimization) that is locally superlinear convergent to optimal points and uses a Kantorovich-like technique.
Abstract: We propose an extension of Newton's method for unconstrained multiobjective optimization (multicriteria optimization). This method does not use a priori chosen weighting factors or any other form of a priori ranking or ordering information for the different objective functions. Newton's direction at each iterate is obtained by minimizing the max-ordering scalarization of the variations on the quadratic approximations of the objective functions. The objective functions are assumed to be twice continuously differentiable and locally strongly convex. Under these hypotheses, the method, as in the classical case, is locally superlinear convergent to optimal points. Again as in the scalar case, if the second derivatives are Lipschitz continuous, the rate of convergence is quadratic. Our convergence analysis uses a Kantorovich-like technique. As a byproduct, existence of optima is obtained under semilocal assumptions.

Journal ArticleDOI
TL;DR: It is proved that von Neumann’s method of “alternating projections” converges locally to a point in the intersection, at a linear rate associated with a modulus of regularity.
Abstract: The idea of a finite collection of closed sets having “linearly regular intersection” at a point is crucial in variational analysis. This central theoretical condition also has striking algorithmic consequences: in the case of two sets, one of which satisfies a further regularity condition (convexity or smoothness, for example), we prove that von Neumann’s method of “alternating projections” converges locally to a point in the intersection, at a linear rate associated with a modulus of regularity. As a consequence, in the case of several arbitrary closed sets having linearly regular intersection at some point, the method of “averaged projections” converges locally at a linear rate to a point in the intersection. Inexact versions of both algorithms also converge linearly.

Journal ArticleDOI
TL;DR: In this paper, a special gradient projection method is introduced that exploits effective scaling strategies and steplength updating rules, appropriately designed for improving the convergence rate, and the authors give convergence results for this scheme and evaluate its effectiveness by means of an extensive computational study on the minimization problems arising from the maximum likelihood approach to image deblurring.
Abstract: A class of scaled gradient projection methods for optimization problems with simple constraints is considered. These iterative algorithms can be useful in variational approaches to image deblurring that lead to minimized convex nonlinear functions subject to non-negativity constraints and, in some cases, to an additional flux conservation constraint. A special gradient projection method is introduced that exploits effective scaling strategies and steplength updating rules, appropriately designed for improving the convergence rate. We give convergence results for this scheme and we evaluate its effectiveness by means of an extensive computational study on the minimization problems arising from the maximum likelihood approach to image deblurring. Comparisons with the standard expectation maximization algorithm and with other iterative regularization schemes are also reported to show the computational gain provided by the proposed method.

Proceedings ArticleDOI
28 Jun 2009
TL;DR: An adaptive line search scheme which allows to tune the step size adaptively and meanwhile guarantees the optimal convergence rate is proposed, which demonstrates the efficiency of the proposed Lassplore algorithm for large-scale problems.
Abstract: Logistic Regression is a well-known classification method that has been used widely in many applications of data mining, machine learning, computer vision, and bioinformatics. Sparse logistic regression embeds feature selection in the classification framework using the l1-norm regularization, and is attractive in many applications involving high-dimensional data. In this paper, we propose Lassplore for solving large-scale sparse logistic regression. Specifically, we formulate the problem as the l1-ball constrained smooth convex optimization, and propose to solve the problem using the Nesterov's method, an optimal first-order black-box method for smooth convex optimization. One of the critical issues in the use of the Nesterov's method is the estimation of the step size at each of the optimization iterations. Previous approaches either applies the constant step size which assumes that the Lipschitz gradient is known in advance, or requires a sequence of decreasing step size which leads to slow convergence in practice. In this paper, we propose an adaptive line search scheme which allows to tune the step size adaptively and meanwhile guarantees the optimal convergence rate. Empirical comparisons with several state-of-the-art algorithms demonstrate the efficiency of the proposed Lassplore algorithm for large-scale problems.

Journal ArticleDOI
TL;DR: A full-wave solver to model large-scale and complex multiscale structures using the augmented electric field integral equation (A-EFIE), which includes both the charge and the current as unknowns to avoid the imbalance between the vector potential and the scalar potential in the conventional EFIE.
Abstract: We describe a full-wave solver to model large-scale and complex multiscale structures. It uses the augmented electric field integral equation (A-EFIE), which includes both the charge and the current as unknowns to avoid the imbalance between the vector potential and the scalar potential in the conventional EFIE. The formulation proves to be stable in the low-frequency regime with the appropriate frequency scaling and the enforcement of charge neutrality. To conquer large-scale and complex problems, we solve the equation using iterative methods, design an efficient constraint preconditioning, and employ the mixed-form fast multipole algorithm (FMA) to accelerate the matrix-vector product. Numerical tests on various examples show high accuracy and fast convergence. At last, complex interconnect and packaging problems with over one million integral equation unknowns can be solved without the help of a parallel computer.

Journal ArticleDOI
TL;DR: In this article, the problem of bridging the gap between two scales in neuronal modeling is addressed, where neurons are considered individually and their behavior described by stochastic differential equations that govern the time variations of their membrane potentials.
Abstract: We deal with the problem of bridging the gap between two scales in neuronal modeling. At the first (microscopic) scale, neurons are considered individually and their behavior described by stochastic differential equations that govern the time variations of their membrane potentials. They are coupled by synaptic connections acting on their resulting activity, a nonlinear function of their membrane potential. At the second (mesoscopic) scale, interacting populations of neurons are described individually by similar equations. The equations describing the dynamical and the stationary mean field behaviors are considered as functional equations on a set of stochastic processes. Using this new point of view allows us to prove that these equations are well-posed on any finite time interval and to provide, by a fixed point method, a constructive method for effectively computing their unique solution. This method is proved to converge to the unique solution and we characterize its complexity and convergence rate. We also provide partial results for the stationary problem on infinite time intervals. These results shed some new light on such neural mass models as the one of Jansen and Rit (Jansen and Rit 1995): their dynamics appears as a coarse approximation of the much richer dynamics that emerges from our analysis. Our numerical experiments confirm that the framework we propose and the numerical methods we derive from it provide a new and powerful tool for the exploration of neural behaviors at different scales.

Journal ArticleDOI
TL;DR: In this article, a strategy for regularizing the inversion procedure for the two-dimensional D-bar reconstruction algorithm based on the global uniqueness proof of Nachman [Ann. Math. 143] for the ill-posed inverse conductivity problem is presented.
Abstract: A strategy for regularizing the inversion procedure for the two-dimensional D-bar reconstruction algorithm based on the global uniqueness proof of Nachman [Ann. Math. 143 (1996)] for the ill-posed inverse conductivity problem is presented. The strategy utilizes truncation of the boundary integral equation and the scattering transform. It is shown that this leads to a bound on the error in the scattering transform and a stable reconstruction of the conductivity; an explicit rate of convergence in appropriate Banach spaces is derived as well. Numerical results are also included, demonstrating the convergence of the reconstructed conductivity to the true conductivity as the noise level tends to zero. The results provide a link between two traditions of inverse problems research: theory of regularization and inversion methods based on complex geometrical optics. Also, the procedure is a novel regularized imaging method for electrical impedance tomography.

Proceedings ArticleDOI
06 Dec 2009
TL;DR: An accelerated gradient method based on an ``optimal'' first order black-box method named after Nesterov and provide the convergence rate for smooth convex loss functions that significantly outperforms the most state-of-the-art methods in both convergence speed and learning accuracy.
Abstract: Many real world learning problems can be recast as multi-task learning problems which utilize correlations among different tasks to obtain better generalization performance than learning each task individually The feature selection problem in multi-task setting has many applications in fields of computer vision, text classification and bio-informatics Generally, it can be realized by solving a L-1-infinity regularized optimization problem And the solution automatically yields the joint sparsity among different tasks However, due to the nonsmooth nature of the L-1-infinity norm, there lacks an efficient training algorithm for solving such problem with general convex loss functions In this paper, we propose an accelerated gradient method based on an ``optimal'' first order black-box method named after Nesterov and provide the convergence rate for smooth convex loss functions For nonsmooth convex loss functions, such as hinge loss, our method still has fast convergence rate empirically Moreover, by exploiting the structure of the L-1-infinity ball, we solve the black-box oracle in Nesterov's method by a simple sorting scheme Our method is suitable for large-scale multi-task learning problem since it only utilizes the first order information and is very easy to implement Experimental results show that our method significantly outperforms the most state-of-the-art methods in both convergence speed and learning accuracy

Proceedings ArticleDOI
19 Apr 2009
TL;DR: A Fast Iterative Shrinkage-Thresholding Algorithm (FISTA) is presented which preserves the computational simplicity of ISTA, but with a global rate of convergence which is proven to be significantly better, both theoretically and practically.
Abstract: We consider the class of Iterative Shrinkage-Thresholding Algorithms (ISTA) for solving linear inverse problems arising in signal/image processing This class of methods is attractive due to its simplicity, however, they are also known to converge quite slowly In this paper we present a Fast Iterative Shrinkage-Thresholding Algorithm (FISTA) which preserves the computational simplicity of ISTA, but with a global rate of convergence which is proven to be significantly better, both theoretically and practically Initial promising numerical results for wavelet-based image deblurring demonstrate the capabilities of FISTA

Journal ArticleDOI
TL;DR: In this paper, the asymptotics for jump-penalized least squares regression aiming at approximating a regression function by piecewise constant functions are studied and it is shown that these estimators are in an adaptive sense rate optimal over certain classes of "approximation spaces."
Abstract: We study the asymptotics for jump-penalized least squares regression aiming at approximating a regression function by piecewise constant functions. Besides conventional consistency and convergence rates of the estimates in L 2 ([0,1)) our results cover other metrics like Skorokhod metric on the space of cadlag functions and uniform metrics on C([0, 1]). We will show that these estimators are in an adaptive sense rate optimal over certain classes of "approximation spaces." Special cases are the class of functions of bounded variation (piecewise) Holder continuous functions of order 0 < α ≤ 1 and the class of step functions with a finite but arbitrary number of jumps. In the latter setting, we will also deduce the rates known from change-point analysis for detecting the jumps. Finally, the issue of fully automatic selection of the smoothing parameter is addressed.

Journal ArticleDOI
TL;DR: Computer-simulation results finally substantiate the performance analysis of this gradient neural network exploited to invert online time-varying matrices, and estimate the upper bound of such an error.
Abstract: This technical note presents theoretical analysis and simulation results on the performance of a classic gradient neural network (GNN), which was designed originally for constant matrix inversion but is now exploited for time-varying matrix inversion. Compared to the constant matrix-inversion case, the gradient neural network inverting a time-varying matrix could only approximately approach its time-varying theoretical inverse, instead of converging exactly. In other words, the steady-state error between the GNN solution and the theoretical/exact inverse does not vanish to zero. In this technical note, the upper bound of such an error is estimated firstly. The global exponential convergence rate is then analyzed for such a Hopfield-type neural network when approaching the bound error. Computer-simulation results finally substantiate the performance analysis of this gradient neural network exploited to invert online time-varying matrices.

Journal ArticleDOI
TL;DR: In this article, the extended finite element method (XFEM) enables the accurate approximation of solutions with jumps or kinks within elements and achieves high-order convergence for arbitrary curved interfaces.
Abstract: The extended finite element method (XFEM) enables the accurate approximation of solutions with jumps or kinks within elements. Optimal convergence rates have frequently been achieved for linear elements and piecewise planar interfaces. Higher-order convergence for arbitrary curved interfaces relies on two major issues: (i) an accurate quadrature of the Galerkin weak form for the cut elements and (ii) a careful formulation of the enrichment, which should preclude any problems in the blending elements. For (i), we employ a strategy of subdividing the elements into subcells with only one curved side. Reference elements that are higher-order on only one side are then used to map the integration points to the real element. For (ii), we find that enrichments for strong discontinuities are easily extended to higher-order accuracy. In contrast, problems in blending elements may hinder optimal convergence for weak discontinuities. Different formulations are investigated, including the corrected XFEM. Numerical results for several test cases involving strong or weak curved discontinuities are presented. Quadratic and cubic approximations are investigated. Optimal convergence rates are achieved using the standard XFEM for the case of a strong discontinuity. Close-to-optimal convergence rates for the case of a weak discontinuity are achieved using the corrected XFEM. Copyright © 2009 John Wiley & Sons, Ltd.