scispace - formally typeset
Search or ask a question
Author

Seung-Jean Kim

Other affiliations: Citigroup
Bio: Seung-Jean Kim is an academic researcher from Stanford University. The author has contributed to research in topics: Convex optimization & Geometric programming. The author has an hindex of 25, co-authored 56 publications receiving 7461 citations. Previous affiliations of Seung-Jean Kim include Citigroup.


Papers
More filters
Journal ArticleDOI
TL;DR: In this paper, the preconditioned conjugate gradients (PCG) algorithm is used to compute the search direction for sparse least-squares programs (LSPs), which can be reformulated as convex quadratic programs, and then solved by several standard methods such as interior-point methods.
Abstract: Recently, a lot of attention has been paid to regularization based methods for sparse signal reconstruction (e.g., basis pursuit denoising and compressed sensing) and feature selection (e.g., the Lasso algorithm) in signal processing, statistics, and related fields. These problems can be cast as -regularized least-squares programs (LSPs), which can be reformulated as convex quadratic programs, and then solved by several standard methods such as interior-point methods, at least for small and medium size problems. In this paper, we describe a specialized interior-point method for solving large-scale -regularized LSPs that uses the preconditioned conjugate gradients algorithm to compute the search direction. The interior-point method can solve large sparse problems, with a million variables and observations, in a few tens of minutes on a PC. It can efficiently solve large dense problems, that arise in sparse signal recovery with orthogonal transforms, by exploiting fast algorithms for these transforms. The method is illustrated on a magnetic resonance imaging data set.

2,047 citations

Journal ArticleDOI
TL;DR: This tutorial paper collects together in one place the basic background material needed to do GP modeling, and shows how to recognize functions and problems compatible with GP, and how to approximate functions or data in a formcompatible with GP.
Abstract: A geometric program (GP) is a type of mathematical optimization problem characterized by objective and constraint functions that have a special form. Recently developed solution methods can solve even large-scale GPs extremely efficiently and reliably; at the same time a number of practical problems, particularly in circuit design, have been found to be equivalent to (or well approximated by) GPs. Putting these two together, we get effective solutions for the practical problems. The basic approach in GP modeling is to attempt to express a practical problem, such as an engineering analysis or design problem, in GP format. In the best case, this formulation is exact; when this is not possible, we settle for an approximate formulation. This tutorial paper collects together in one place the basic background material needed to do GP modeling. We start with the basic definitions and facts, and some methods used to transform problems into GP format. We show how to recognize functions and problems compatible with GP, and how to approximate functions or data in a form compatible with GP (when this is possible). We give some simple and representative examples, and also describe some common extensions of GP, along with methods for solving (or approximately solving) them.

1,215 citations

Journal ArticleDOI
TL;DR: The problem of finding the (symmetric) edge weights that result in the least mean-square deviation in steady state is considered and it is shown that this problem can be cast as a convex optimization problem, so the global solution can be found efficiently.

1,166 citations

01 Jan 2007
TL;DR: In this article, an efficient interior-point method for solving large-scale 1-regularized logistic regression problems is described. But the method is not suitable for large scale problems, such as the 20 Newsgroups data set.
Abstract: Logistic regression with ‘1 regularization has been proposed as a promising method for feature selection in classification problems. In this paper we describe an efficient interior-point method for solving large-scale ‘1-regularized logistic regression problems. Small problems with up to a thousand or so features and examples can be solved in seconds on a PC; medium sized problems, with tens of thousands of features and examples, can be solved in tens of seconds (assuming some sparsity in the data). A variation on the basic method, that uses a preconditioned conjugate gradient method to compute the search step, can solve very large problems, with a million features and examples (e.g., the 20 Newsgroups data set), in a few minutes, on a PC. Using warm-start techniques, a good approximation of the entire regularization path can be computed much more efficiently than by solving a family of problems independently.

596 citations

Journal ArticleDOI
TL;DR: This paper proposes a variation on Hodrick-Prescott (H-P) filtering, a widely used method for trend estimation that substitutes a sum of absolute values for the sum of squares used in H-P filtering to penalize variations in the estimated trend.
Abstract: The problem of estimating underlying trends in time series data arises in a variety of disciplines. In this paper we propose a variation on Hodrick-Prescott (H-P) filtering, a widely used method for trend estimation. The proposed $\ell_1$ trend filtering method substitutes a sum of absolute values (i.e., $\ell_1$ norm) for the sum of squares used in H-P filtering to penalize variations in the estimated trend. The $\ell_1$ trend filtering method produces trend estimates that are piecewise linear, and therefore it is well suited to analyzing time series with an underlying piecewise linear trend. The kinks, knots, or changes in slope of the estimated trend can be interpreted as abrupt changes or events in the underlying dynamics of the time series. Using specialized interior-point methods, $\ell_1$ trend filtering can be carried out with not much more effort than H-P filtering; in particular, the number of arithmetic operations required grows linearly with the number of data points. We describe the method and some of its basic properties and give some illustrative examples. We show how the method is related to $\ell_1$ regularization-based methods in sparse signal recovery and feature selection, and we list some extensions of the basic method.

577 citations


Cited by
More filters
Book
23 May 2011
TL;DR: It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.
Abstract: Many problems of recent interest in statistics and machine learning can be posed in the framework of convex optimization. Due to the explosion in size and complexity of modern datasets, it is increasingly important to be able to solve problems with a very large number of features or training examples. As a result, both the decentralized collection or storage of these datasets as well as accompanying distributed solution methods are either necessary or at least highly desirable. In this review, we argue that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas. The method was developed in the 1970s, with roots in the 1950s, and is equivalent or closely related to many other algorithms, such as dual decomposition, the method of multipliers, Douglas–Rachford splitting, Spingarn's method of partial inverses, Dykstra's alternating projections, Bregman iterative algorithms for l1 problems, proximal methods, and others. After briefly surveying the theory and history of the algorithm, we discuss applications to a wide variety of statistical and machine learning problems of recent interest, including the lasso, sparse logistic regression, basis pursuit, covariance selection, support vector machines, and many others. We also discuss general distributed optimization, extensions to the nonconvex setting, and efficient implementation, including some details on distributed MPI and Hadoop MapReduce implementations.

17,433 citations

Journal ArticleDOI
TL;DR: In comparative timings, the new algorithms are considerably faster than competing methods and can handle large problems and can also deal efficiently with sparse features.
Abstract: We develop fast algorithms for estimation of generalized linear models with convex penalties. The models include linear regression, two-class logistic regression, and multinomial regression problems while the penalties include l(1) (the lasso), l(2) (ridge regression) and mixtures of the two (the elastic net). The algorithms use cyclical coordinate descent, computed along a regularization path. The methods can handle large problems and can also deal efficiently with sparse features. In comparative timings we find that the new algorithms are considerably faster than competing methods.

13,656 citations

Journal ArticleDOI
TL;DR: Practical incoherent undersampling schemes are developed and analyzed by means of their aliasing interference and demonstrate improved spatial resolution and accelerated acquisition for multislice fast spin‐echo brain imaging and 3D contrast enhanced angiography.
Abstract: The sparsity which is implicit in MR images is exploited to significantly undersample k -space. Some MR images such as angiograms are already sparse in the pixel representation; other, more complicated images have a sparse representation in some transform domain–for example, in terms of spatial finite-differences or their wavelet coefficients. According to the recently developed mathematical theory of compressedsensing, images with a sparse representation can be recovered from randomly undersampled k -space data, provided an appropriate nonlinear recovery scheme is used. Intuitively, artifacts due to random undersampling add as noise-like interference. In the sparse transform domain the significant coefficients stand out above the interference. A nonlinear thresholding scheme can recover the sparse coefficients, effectively recovering the image itself. In this article, practical incoherent undersampling schemes are developed and analyzed by means of their aliasing interference. Incoherence is introduced by pseudo-random variable-density undersampling of phase-encodes. The reconstruction is performed by minimizing the 1 norm of a transformed image, subject to data

6,653 citations

Journal ArticleDOI
TL;DR: A novel method for sparse signal recovery that in many situations outperforms ℓ1 minimization in the sense that substantially fewer measurements are needed for exact recovery.
Abstract: It is now well understood that (1) it is possible to reconstruct sparse signals exactly from what appear to be highly incomplete sets of linear measurements and (2) that this can be done by constrained l1 minimization. In this paper, we study a novel method for sparse signal recovery that in many situations outperforms l1 minimization in the sense that substantially fewer measurements are needed for exact recovery. The algorithm consists of solving a sequence of weighted l1-minimization problems where the weights used for the next iteration are computed from the value of the current solution. We present a series of experiments demonstrating the remarkable performance and broad applicability of this algorithm in the areas of sparse signal recovery, statistical estimation, error correction and image processing. Interestingly, superior gains are also achieved when our method is applied to recover signals with assumed near-sparsity in overcomplete representations—not by reweighting the l1 norm of the coefficient sequence as is common, but by reweighting the l1 norm of the transformed object. An immediate consequence is the possibility of highly efficient data acquisition protocols by improving on a technique known as Compressive Sensing.

4,869 citations

Journal ArticleDOI
TL;DR: A new iterative recovery algorithm called CoSaMP is described that delivers the same guarantees as the best optimization-based approaches and offers rigorous bounds on computational cost and storage.

3,970 citations