scispace - formally typeset
Search or ask a question
Proceedings Article

EfficientL 1 regularized logistic regression

16 Jul 2006-pp 401-408
TL;DR: Theoretical results show that the proposed efficient algorithm for L1 regularized logistic regression is guaranteed to converge to the global optimum, and experiments show that it significantly outperforms standard algorithms for solving convex optimization problems.
Abstract: L1 regularized logistic regression is now a workhorse of machine learning: it is widely used for many classification problems, particularly ones with many features. L1 regularized logistic regression requires solving a convex optimization problem. However, standard algorithms for solving convex optimization problems do not scale well enough to handle the large datasets encountered in many practical settings. In this paper, we propose an efficient algorithm for L1 regularized logistic regression. Our algorithm iteratively approximates the objective function by a quadratic approximation at the current point, while maintaining the L1 constraint. In each iteration, it uses the efficient LARS (Least Angle Regression) algorithm to solve the resulting L1 constrained quadratic optimization problem. Our theoretical results show that our algorithm is guaranteed to converge to the global optimum. Our experiments show that our algorithm significantly outperforms standard algorithms for solving convex optimization problems. Moreover, our algorithm outperforms four previously published algorithms that were specifically designed to solve the L1 regularized logistic regression problem.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI
22 Oct 2007
TL;DR: An efficient interior-point method for solving large-scale lscr1-regularized convex loss minimization problems that uses a preconditioned conjugate gradient method to compute the search step and can solve very large problems.
Abstract: Convex loss minimization with lscr1 regularization has been proposed as a promising method for feature selection in classification (e.g., lscr1-regularized logistic regression) and regression (e.g., lscr1-regularized least squares). In this paper we describe an efficient interior-point method for solving large-scale lscr1-regularized convex loss minimization problems that uses a preconditioned conjugate gradient method to compute the search step. The method can solve very large problems. For example, the method can solve an lscr1-regularized logistic regression problem with a million features and examples (e.g., the 20 Newsgroups data set), in a few minutes, on a PC.

2 citations


Cites methods from "EfficientL 1 regularized logistic r..."

  • ...When the loss function is twice differentiable, it can be solved by standard convex optimization methods such as SQP, augmented Lagrangian, interior-point, and other methods....

    [...]

01 Jan 2013
TL;DR: This thesis introduces interactive concept coverage, a general framework for personalization that incentivizes diversity, and applies in both queryless settings as well as settings requiring complex and rich user interactions, and significantly outperforms both state-of-the-art algorithms and industrial market leaders on two important personalization domains.
Abstract: We live in an era of information overload. From online news to online shopping to scholarly research, we are inundated with a torrent of information on a daily basis. With our limited time, money and attention, we often struggle to extract actionable knowledge from this deluge of data. A common approach for addressing this challenge is personalization , where results are automatically filtered to match the tastes and preferences of individual users. While showing promise, modern systems and algorithms for personalization face their own set of challenges, both technical and social in nature. On the technical side, these include the well-documented "cold start" problem, redundant result sets and an inability to move beyond simple user interactions, such as keyword queries and star ratings. From a social standpoint, studies have shown that most Americans have negative opinions of personalization, primarily due to privacy concerns. In this thesis, we address these challenges by introducing interactive concept coverage, a general framework for personalization that incentivizes diversity, and applies in both queryless settings as well as settings requiring complex and rich user interactions. This framework involves framing personalized recommendation as a probabilistic budgeted max-cover problem, where each item to be recommended is defined to probabilistically cover one or more concepts. From user interaction, we learn weights on concepts and affinities for items, such that solving the resulting optimization problem results in personalized, diverse recommendations. Theoretical properties of our framework guarantee efficient, near-optimal solutions to our objective function, and no-regret learning of user preferences. We show that, by using the interactive concept coverage methodology, we are able to significantly outperform both state-of-the-art algorithms and industrial market leaders on two important personalization domains: news recommendation and scientific literature discovery. Empirical evaluations—including live user studies—demonstrate that our approach produces more diverse, more relevant and more trustworthy results than leading competitors, with minimal burden on the user. Finally, we show that we can directly use our framework to introduce a level of transparency to personalization that gives users the opportunity to understand and directly interpret (and correct) how the system views them. By successfully addressing many of the social and technical challenges of personalization, we believe the work in this thesis takes an important step in ameliorating problems of information overload.

2 citations


Cites background from "EfficientL 1 regularized logistic r..."

  • ...…`1 penalty term can be found in many other objective functions throughout machine learning, including logistic regression [Koh et al., 2007, Lee et al., 2006], sparse coding [Olshausen and Field, 1996] and dictionary learning [Mairal et al., 2010], all with the incentive of producing a…...

    [...]

Proceedings ArticleDOI
26 May 2013
TL;DR: The proposed method reduces the Lp problem into L1 regularized one via transforming target variables by the mapping based on Lp, and optimizes it by using orthant-wise approach without reformulating it into iteratively reweighting scheme.
Abstract: Sparsity induced in the optimized weights effectively works for factorization with robustness to noises and for classification with feature selection. For enhancing the sparsity, L1 regularization is introduced into the objective cost function to be minimized. In general, however, Lp (p<;1) regularization leads to more sparse solutions than L1, though Lp regularized problem is difficult to be effectively optimized. In this paper, we propose a method to efficiently optimize the Lp regularized problem. The method reduces the Lp problem into L1 regularized one via transforming target variables by the mapping based on Lp, and optimizes it by using orthant-wise approach. In the proposed method, the Lp problem is directly optimized for computational efficiency without reformulating it into iteratively reweighting scheme. The proposed method is generally applicable to various problems with Lp regularization, such as factorization and classification. In the experiments on the classification using logistic regression and factorization based on least squares, the proposed method produces favorable sparse results.

2 citations


Cites background or result from "EfficientL 1 regularized logistic r..."

  • ...The results are shown in Table 1 with comparison to L1-regularized LR [2]....

    [...]

  • ...Sparsity induced models have attracted keen attention in the fields of signal processing, such as for factorization [1], pattern classification [2] and computer vision [3]....

    [...]

Dissertation
26 Aug 2014
TL;DR: This thesis proposes a novel augmented Lagrangian method for solving the l1-norm relaxation problems of the original l0 minimization problems and applies it to the proposed formulation of sparse principal component analysis (PCA), and establishes some convergence results for both inner and outer methods.
Abstract: In the last two decades, there are numerous applications in which sparse solutions are concerned. Mathematically, all these applications can be formulated into the l0 minimization problems. In this thesis, we first propose a novel augmented Lagrangian (AL) method for solving the l1-norm relaxation problems of the original l0 minimization problems and apply it to our proposed formulation of sparse principal component analysis (PCA). We next propose penalty decomposition (PD) methods for solving the original l0 minimization problems in which a sequence of penalty subproblems are solved by a block coordinate descent (BCD) method. For the AL method, we show that under some regularity assumptions, it converges to a stationary point. Additionally, we propose two nonmonotone gradient methods for solving the AL subproblems, and establish their global and local convergence. Moreover, we apply the AL method to our proposed formulation of sparse PCA and compare our approach with several existing methods on synthetic, Pitprops, and gene expression data, respectively. The computational results demonstrate that the sparse principal components (PCs) produced by our approach substantially outperform those by other methods in terms of total explained variance, correlation of PCs, and orthogonality of loading vectors. For the PD methods, under some suitable assumptions, we establish some convergence results for both inner (the BCD method) and outer (the PD method) iterations, respectively. We test the performance of our PD methods by applying them to sparse logistic regression, sparse inverse covariance selection, and compressed sensing problems. The computational results demonstrate that when solutions of same cardinality are sought, our approach applied to the l0-based models generally has better solution quality and/or speed than the existing approaches that are applied to the corresponding l1-based models. Finally, we adapt the PD method to solve our proposed wavelet frame based image

2 citations


Cites background from "EfficientL 1 regularized logistic r..."

  • ...for some regularization parameter λ ≥ 0 (see, for example, [80, 54, 102, 82, 85, 119])....

    [...]

Proceedings ArticleDOI
11 Jul 2016
TL;DR: Experimental results demonstrate that the proposed algorithm can be adopted as an effective technique for encrypted data stream identification and improve sparse representation of randomness features and Fuzzy Gaussian Mixture Model to improve identification accuracy.
Abstract: The accurate identification of encrypted data stream helps to regulate illegal data, detect network attacks and protect users' information. In this paper, a novel encrypted data stream identification algorithm is introduced. The proposed method is based on randomness characteristics of encrypted data stream. We use a l1-norm regularized logistic regression to improve sparse representation of randomness features and Fuzzy Gaussian Mixture Model (FGMM) to improve identification accuracy. Experimental results demonstrate that the method can be adopted as an effective technique for encrypted data stream identification.

2 citations


Cites background from "EfficientL 1 regularized logistic r..."

  • ...[13] Lee, Su-In, Honglak Lee, Pieter Abbeel, and Andrew Y....

    [...]

  • ...Setting 0 ρ = reverses the problem to Logistic Regression which minimizes the empirical loss term [13-14]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Abstract: SUMMARY We propose a new method for estimation in linear models. The 'lasso' minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree-based models are briefly described.

40,785 citations


"EfficientL 1 regularized logistic r..." refers methods in this paper

  • ...(Tibshirani 1996) Several algorithms have been developed to solve L1 constrained least squares problems....

    [...]

  • ...See, Tibshirani (1996) for details.)...

    [...]

  • ...(Tibshirani 1996) Several algorithms have been developed to solve L1 constrained least squares problems....

    [...]

Book
01 Mar 2004
TL;DR: In this article, the focus is on recognizing convex optimization problems and then finding the most appropriate technique for solving them, and a comprehensive introduction to the subject is given. But the focus of this book is not on the optimization problem itself, but on the problem of finding the appropriate technique to solve it.
Abstract: Convex optimization problems arise frequently in many different fields. A comprehensive introduction to the subject, this book shows in detail how such problems can be solved numerically with great efficiency. The focus is on recognizing convex optimization problems and then finding the most appropriate technique for solving them. The text contains many worked examples and homework exercises and will appeal to students, researchers and practitioners in fields such as engineering, computer science, mathematics, statistics, finance, and economics.

33,341 citations

Book
01 Jan 1983
TL;DR: In this paper, a generalization of the analysis of variance is given for these models using log- likelihoods, illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables), and gamma (variance components).
Abstract: The technique of iterative weighted linear regression can be used to obtain maximum likelihood estimates of the parameters with observations distributed according to some exponential family and systematic effects that can be made linear by a suitable transformation. A generalization of the analysis of variance is given for these models using log- likelihoods. These generalized linear models are illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables) and gamma (variance components).

23,215 citations

01 Jan 1998

12,940 citations


"EfficientL 1 regularized logistic r..." refers methods in this paper

  • ...We tested each algorithm’s performance on 12 different datasets, consisting of 9 UCI datasets (Newman et al. 1998), one artificial dataset called Madelon from the NIPS 2003 workshop on feature extraction,3 and two gene expression datasets (Microarray 1 and 2).4 Table 2 gives details on the number…...

    [...]

  • ...We tested each algorithm’s performance on 12 different real datasets, consisting of 9 UCI datasets (Newman et al. 1998) and 3 gene expression datasets (Microarray 1, 2 and 3) 3....

    [...]

Journal ArticleDOI
TL;DR: This is the Ž rst book on generalized linear models written by authors not mostly associated with the biological sciences, and it is thoroughly enjoyable to read.
Abstract: This is the Ž rst book on generalized linear models written by authors not mostly associated with the biological sciences. Subtitled “With Applications in Engineering and the Sciences,” this book’s authors all specialize primarily in engineering statistics. The Ž rst author has produced several recent editions of Walpole, Myers, and Myers (1998), the last reported by Ziegel (1999). The second author has had several editions of Montgomery and Runger (1999), recently reported by Ziegel (2002). All of the authors are renowned experts in modeling. The Ž rst two authors collaborated on a seminal volume in applied modeling (Myers and Montgomery 2002), which had its recent revised edition reported by Ziegel (2002). The last two authors collaborated on the most recent edition of a book on regression analysis (Montgomery, Peck, and Vining (2001), reported by Gray (2002), and the Ž rst author has had multiple editions of his own regression analysis book (Myers 1990), the latest of which was reported by Ziegel (1991). A comparable book with similar objectives and a more speciŽ c focus on logistic regression, Hosmer and Lemeshow (2000), reported by Conklin (2002), presumed a background in regression analysis and began with generalized linear models. The Preface here (p. xi) indicates an identical requirement but nonetheless begins with 100 pages of material on linear and nonlinear regression. Most of this will probably be a review for the readers of the book. Chapter 2, “Linear Regression Model,” begins with 50 pages of familiar material on estimation, inference, and diagnostic checking for multiple regression. The approach is very traditional, including the use of formal hypothesis tests. In industrial settings, use of p values as part of a risk-weighted decision is generally more appropriate. The pedagologic approach includes formulas and demonstrations for computations, although computing by Minitab is eventually illustrated. Less-familiar material on maximum likelihood estimation, scaled residuals, and weighted least squares provides more speciŽ c background for subsequent estimation methods for generalized linear models. This review is not meant to be disparaging. The authors have packed a wealth of useful nuggets for any practitioner in this chapter. It is thoroughly enjoyable to read. Chapter 3, “Nonlinear Regression Models,” is arguably less of a review, because regression analysis courses often give short shrift to nonlinear models. The chapter begins with a great example on the pitfalls of linearizing a nonlinear model for parameter estimation. It continues with the effective balancing of explicit statements concerning the theoretical basis for computations versus the application and demonstration of their use. The details of maximum likelihood estimation are again provided, and weighted and generalized regression estimation are discussed. Chapter 4 is titled “Logistic and Poisson Regression Models.” Logistic regression provides the basic model for generalized linear models. The prior development for weighted regression is used to motivate maximum likelihood estimation for the parameters in the logistic model. The algebraic details are provided. As in the development for linear models, some of the details are pushed into an appendix. In addition to connecting to the foregoing material on regression on several occasions, the authors link their development forward to their following chapter on the entire family of generalized linear models. They discuss score functions, the variance-covariance matrix, Wald inference, likelihood inference, deviance, and overdispersion. Careful explanations are given for the values provided in standard computer software, here PROC LOGISTIC in SAS. The value in having the book begin with familiar regression concepts is clearly realized when the analogies are drawn between overdispersion and nonhomogenous variance, or analysis of deviance and analysis of variance. The authors rely on the similarity of Poisson regression methods to logistic regression methods and mostly present illustrations for Poisson regression. These use PROC GENMOD in SAS. The book does not give any of the SAS code that produces the results. Two of the examples illustrate designed experiments and modeling. They include discussion of subset selection and adjustment for overdispersion. The mathematic level of the presentation is elevated in Chapter 5, “The Family of Generalized Linear Models.” First, the authors unify the two preceding chapters under the exponential distribution. The material on the formal structure for generalized linear models (GLMs), likelihood equations, quasilikelihood, the gamma distribution family, and power functions as links is some of the most advanced material in the book. Most of the computational details are relegated to appendixes. A discussion of residuals returns one to a more practical perspective, and two long examples on gamma distribution applications provide excellent guidance on how to put this material into practice. One example is a contrast to the use of linear regression with a log transformation of the response, and the other is a comparison to the use of a different link function in the previous chapter. Chapter 6 considers generalized estimating equations (GEEs) for longitudinal and analogous studies. The Ž rst half of the chapter presents the methodology, and the second half demonstrates its application through Ž ve different examples. The basis for the general situation is Ž rst established using the case with a normal distribution for the response and an identity link. The importance of the correlation structure is explained, the iterative estimation procedure is shown, and estimation for the scale parameters and the standard errors of the coefŽ cients is discussed. The procedures are then generalized for the exponential family of distributions and quasi-likelihood estimation. Two of the examples are standard repeated-measures illustrations from biostatistical applications, but the last three illustrations are all interesting reworkings of industrial applications. The GEE computations in PROC GENMOD are applied to account for correlations that occur with multiple measurements on the subjects or restrictions to randomizations. The examples show that accounting for correlation structure can result in different conclusions. Chapter 7, “Further Advances and Applications in GLM,” discusses several additional topics. These are experimental designs for GLMs, asymptotic results, analysis of screening experiments, data transformation, modeling for both a process mean and variance, and generalized additive models. The material on experimental designs is more discursive than prescriptive and as a result is also somewhat theoretical. Similar comments apply for the discussion on the quality of the asymptotic results, which wallows a little too much in reports on various simulation studies. The examples on screening and data transformations experiments are again reworkings of analyses of familiar industrial examples and another obvious motivation for the enthusiasm that the authors have developed for using the GLM toolkit. One can hope that subsequent editions will similarly contain new examples that will have caused the authors to expand the material on generalized additive models and other topics in this chapter. Designating myself to review a book that I know I will love to read is one of the rewards of being editor. I read both of the editions of McCullagh and Nelder (1989), which was reviewed by Schuenemeyer (1992). That book was not fun to read. The obvious enthusiasm of Myers, Montgomery, and Vining and their reliance on their many examples as a major focus of their pedagogy make Generalized Linear Models a joy to read. Every statistician working in any area of applied science should buy it and experience the excitement of these new approaches to familiar activities.

10,520 citations


Additional excerpts

  • ...(Nelder & Wedderbum 1972; McCullagh & Nelder 1989)...

    [...]

  • ...(Nelder & Wedderbum 1972; McCullagh & Nelder 1989 )...

    [...]