EfficientL 1 regularized logistic regression

Home
/
Papers
/
EfficientL 1 regularized logistic regression

Proceedings Article•

EfficientL 1 regularized logistic regression

Sun-In Lee¹, Honglak Lee¹, Pieter Abbeel¹, Andrew Y. Ng¹•Institutions (1)

16 Jul 2006-pp 401-408

TL;DR: Theoretical results show that the proposed efficient algorithm for L1 regularized logistic regression is guaranteed to converge to the global optimum, and experiments show that it significantly outperforms standard algorithms for solving convex optimization problems.

read less

Abstract: L1 regularized logistic regression is now a workhorse of machine learning: it is widely used for many classification problems, particularly ones with many features. L1 regularized logistic regression requires solving a convex optimization problem. However, standard algorithms for solving convex optimization problems do not scale well enough to handle the large datasets encountered in many practical settings. In this paper, we propose an efficient algorithm for L1 regularized logistic regression. Our algorithm iteratively approximates the objective function by a quadratic approximation at the current point, while maintaining the L1 constraint. In each iteration, it uses the efficient LARS (Least Angle Regression) algorithm to solve the resulting L1 constrained quadratic optimization problem. Our theoretical results show that our algorithm is guaranteed to converge to the global optimum. Our experiments show that our algorithm significantly outperforms standard algorithms for solving convex optimization problems. Moreover, our algorithm outperforms four previously published algorithms that were specifically designed to solve the L1 regularized logistic regression problem.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

An Efficient Method for Large-Scale l 1 -Regularized Convex Loss Minimization

[...]

Kwangmoo Koh¹, Seung-Jean Kim¹, Stephen Boyd¹•Institutions (1)

Stanford University¹

22 Oct 2007

TL;DR: An efficient interior-point method for solving large-scale lscr1-regularized convex loss minimization problems that uses a preconditioned conjugate gradient method to compute the search step and can solve very large problems.

...read moreread less

Abstract: Convex loss minimization with lscr1 regularization has been proposed as a promising method for feature selection in classification (e.g., lscr1-regularized logistic regression) and regression (e.g., lscr1-regularized least squares). In this paper we describe an efficient interior-point method for solving large-scale lscr1-regularized convex loss minimization problems that uses a preconditioned conjugate gradient method to compute the search step. The method can solve very large problems. For example, the method can solve an lscr1-regularized logistic regression problem with a million features and examples (e.g., the 20 Newsgroups data set), in a few minutes, on a PC.

...read moreread less

2 citations

Cites methods from "EfficientL 1 regularized logistic r..."

...When the loss function is twice differentiable, it can be solved by standard convex optimization methods such as SQP, augmented Lagrangian, interior-point, and other methods....
[...]

Beyond keyword search: representations and models for personalization

[...]

Carlos Guestrin¹, Khalid El-Arini¹•Institutions (1)

Carnegie Mellon University¹

01 Jan 2013

TL;DR: This thesis introduces interactive concept coverage, a general framework for personalization that incentivizes diversity, and applies in both queryless settings as well as settings requiring complex and rich user interactions, and significantly outperforms both state-of-the-art algorithms and industrial market leaders on two important personalization domains.

...read moreread less

Abstract: We live in an era of information overload. From online news to online shopping to scholarly research, we are inundated with a torrent of information on a daily basis. With our limited time, money and attention, we often struggle to extract actionable knowledge from this deluge of data. A common approach for addressing this challenge is personalization , where results are automatically filtered to match the tastes and preferences of individual users. While showing promise, modern systems and algorithms for personalization face their own set of challenges, both technical and social in nature. On the technical side, these include the well-documented "cold start" problem, redundant result sets and an inability to move beyond simple user interactions, such as keyword queries and star ratings. From a social standpoint, studies have shown that most Americans have negative opinions of personalization, primarily due to privacy concerns. In this thesis, we address these challenges by introducing interactive concept coverage, a general framework for personalization that incentivizes diversity, and applies in both queryless settings as well as settings requiring complex and rich user interactions. This framework involves framing personalized recommendation as a probabilistic budgeted max-cover problem, where each item to be recommended is defined to probabilistically cover one or more concepts. From user interaction, we learn weights on concepts and affinities for items, such that solving the resulting optimization problem results in personalized, diverse recommendations. Theoretical properties of our framework guarantee efficient, near-optimal solutions to our objective function, and no-regret learning of user preferences. We show that, by using the interactive concept coverage methodology, we are able to significantly outperform both state-of-the-art algorithms and industrial market leaders on two important personalization domains: news recommendation and scientific literature discovery. Empirical evaluations—including live user studies—demonstrate that our approach produces more diverse, more relevant and more trustworthy results than leading competitors, with minimal burden on the user. Finally, we show that we can directly use our framework to introduce a level of transparency to personalization that gives users the opportunity to understand and directly interpret (and correct) how the system views them. By successfully addressing many of the social and technical challenges of personalization, we believe the work in this thesis takes an important step in ameliorating problems of information overload.

...read moreread less

2 citations

Cites background from "EfficientL 1 regularized logistic r..."

...…`1 penalty term can be found in many other objective functions throughout machine learning, including logistic regression [Koh et al., 2007, Lee et al., 2006], sparse coding [Olshausen and Field, 1996] and dictionary learning [Mairal et al., 2010], all with the incentive of producing a…...
[...]

Proceedings Article•DOI•

L p -regularized optimization by using orthant-wise approach for inducing sparsity

[...]

Takumi Kobayashi¹•Institutions (1)

National Institute of Advanced Industrial Science and Technology¹

26 May 2013

TL;DR: The proposed method reduces the Lp problem into L1 regularized one via transforming target variables by the mapping based on Lp, and optimizes it by using orthant-wise approach without reformulating it into iteratively reweighting scheme.

...read moreread less

Abstract: Sparsity induced in the optimized weights effectively works for factorization with robustness to noises and for classification with feature selection. For enhancing the sparsity, L1 regularization is introduced into the objective cost function to be minimized. In general, however, Lp (p<;1) regularization leads to more sparse solutions than L1, though Lp regularized problem is difficult to be effectively optimized. In this paper, we propose a method to efficiently optimize the Lp regularized problem. The method reduces the Lp problem into L1 regularized one via transforming target variables by the mapping based on Lp, and optimizes it by using orthant-wise approach. In the proposed method, the Lp problem is directly optimized for computational efficiency without reformulating it into iteratively reweighting scheme. The proposed method is generally applicable to various problems with Lp regularization, such as factorization and classification. In the experiments on the classification using logistic regression and factorization based on least squares, the proposed method produces favorable sparse results.

...read moreread less

2 citations

Cites background or result from "EfficientL 1 regularized logistic r..."

...The results are shown in Table 1 with comparison to L1-regularized LR [2]....
[...]
...Sparsity induced models have attracted keen attention in the fields of signal processing, such as for factorization [1], pattern classification [2] and computer vision [3]....
[...]

Dissertation•

Optimization Methods for Sparse Approximation

[...]

Yong Zhang

26 Aug 2014

TL;DR: This thesis proposes a novel augmented Lagrangian method for solving the l1-norm relaxation problems of the original l0 minimization problems and applies it to the proposed formulation of sparse principal component analysis (PCA), and establishes some convergence results for both inner and outer methods.

...read moreread less

Abstract: In the last two decades, there are numerous applications in which sparse solutions are concerned. Mathematically, all these applications can be formulated into the l0 minimization problems. In this thesis, we first propose a novel augmented Lagrangian (AL) method for solving the l1-norm relaxation problems of the original l0 minimization problems and apply it to our proposed formulation of sparse principal component analysis (PCA). We next propose penalty decomposition (PD) methods for solving the original l0 minimization problems in which a sequence of penalty subproblems are solved by a block coordinate descent (BCD) method. For the AL method, we show that under some regularity assumptions, it converges to a stationary point. Additionally, we propose two nonmonotone gradient methods for solving the AL subproblems, and establish their global and local convergence. Moreover, we apply the AL method to our proposed formulation of sparse PCA and compare our approach with several existing methods on synthetic, Pitprops, and gene expression data, respectively. The computational results demonstrate that the sparse principal components (PCs) produced by our approach substantially outperform those by other methods in terms of total explained variance, correlation of PCs, and orthogonality of loading vectors. For the PD methods, under some suitable assumptions, we establish some convergence results for both inner (the BCD method) and outer (the PD method) iterations, respectively. We test the performance of our PD methods by applying them to sparse logistic regression, sparse inverse covariance selection, and compressed sensing problems. The computational results demonstrate that when solutions of same cardinality are sought, our approach applied to the l0-based models generally has better solution quality and/or speed than the existing approaches that are applied to the corresponding l1-based models. Finally, we adapt the PD method to solve our proposed wavelet frame based image

...read moreread less

2 citations

Cites background from "EfficientL 1 regularized logistic r..."

...for some regularization parameter λ ≥ 0 (see, for example, [80, 54, 102, 82, 85, 119])....
[...]

Proceedings Article•DOI•

Encrypted data stream identification using randomness sparse representation and fuzzy Gaussian mixture model

[...]

Hong Zhang¹, Rui Hou, Lei Yi, Juan Meng, Zhisong Pan, Yuhuan Zhou - Show less +2 more•Institutions (1)

National University of Defense Technology¹

11 Jul 2016

TL;DR: Experimental results demonstrate that the proposed algorithm can be adopted as an effective technique for encrypted data stream identification and improve sparse representation of randomness features and Fuzzy Gaussian Mixture Model to improve identification accuracy.

...read moreread less

Abstract: The accurate identification of encrypted data stream helps to regulate illegal data, detect network attacks and protect users' information. In this paper, a novel encrypted data stream identification algorithm is introduced. The proposed method is based on randomness characteristics of encrypted data stream. We use a l1-norm regularized logistic regression to improve sparse representation of randomness features and Fuzzy Gaussian Mixture Model (FGMM) to improve identification accuracy. Experimental results demonstrate that the method can be adopted as an effective technique for encrypted data stream identification.

...read moreread less

2 citations

Cites background from "EfficientL 1 regularized logistic r..."

...[13] Lee, Su-In, Honglak Lee, Pieter Abbeel, and Andrew Y....
[...]
...Setting 0 ρ = reverses the problem to Logistic Regression which minimizes the empirical loss term [13-14]....
[...]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
…
56
57
58
59
60
61
62
…
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Regression Shrinkage and Selection via the Lasso

[...]

Robert Tibshirani

01 Jan 1996-Journal of the royal statistical society series b-methodological

TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.

...read moreread less

Abstract: SUMMARY We propose a new method for estimation in linear models. The 'lasso' minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree-based models are briefly described.

...read moreread less

40,785 citations

"EfficientL 1 regularized logistic r..." refers methods in this paper

...(Tibshirani 1996) Several algorithms have been developed to solve L1 constrained least squares problems....
[...]
...See, Tibshirani (1996) for details.)...
[...]
...(Tibshirani 1996) Several algorithms have been developed to solve L1 constrained least squares problems....
[...]

Book•

Convex Optimization

[...]

Stephen Boyd¹, Lieven Vandenberghe²•Institutions (2)

Stanford University¹, University of California, Los Angeles²

01 Mar 2004

TL;DR: In this article, the focus is on recognizing convex optimization problems and then finding the most appropriate technique for solving them, and a comprehensive introduction to the subject is given. But the focus of this book is not on the optimization problem itself, but on the problem of finding the appropriate technique to solve it.

...read moreread less

Abstract: Convex optimization problems arise frequently in many different fields. A comprehensive introduction to the subject, this book shows in detail how such problems can be solved numerically with great efficiency. The focus is on recognizing convex optimization problems and then finding the most appropriate technique for solving them. The text contains many worked examples and homework exercises and will appeal to students, researchers and practitioners in fields such as engineering, computer science, mathematics, statistics, finance, and economics.

...read moreread less

33,341 citations

Book•

Generalized Linear Models

[...]

Peter McCullagh¹, John A. Nelder•Institutions (1)

Imperial College London¹

01 Jan 1983

TL;DR: In this paper, a generalization of the analysis of variance is given for these models using log- likelihoods, illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables), and gamma (variance components).

...read moreread less

Abstract: The technique of iterative weighted linear regression can be used to obtain maximum likelihood estimates of the parameters with observations distributed according to some exponential family and systematic effects that can be made linear by a suitable transformation. A generalization of the analysis of variance is given for these models using log- likelihoods. These generalized linear models are illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables) and gamma (variance components).

...read moreread less

23,215 citations

UCI Repository of machine learning databases

[...]

Catherine Blake

01 Jan 1998

12,940 citations

"EfficientL 1 regularized logistic r..." refers methods in this paper

...We tested each algorithm’s performance on 12 different datasets, consisting of 9 UCI datasets (Newman et al. 1998), one artificial dataset called Madelon from the NIPS 2003 workshop on feature extraction,3 and two gene expression datasets (Microarray 1 and 2).4 Table 2 gives details on the number…...
[...]
...We tested each algorithm’s performance on 12 different real datasets, consisting of 9 UCI datasets (Newman et al. 1998) and 3 gene expression datasets (Microarray 1, 2 and 3) 3....
[...]

Journal Article•DOI•

Generalized Linear Models

[...]

Eric R. Ziegel

01 Aug 2002-Technometrics

TL;DR: This is the rst book on generalized linear models written by authors not mostly associated with the biological sciences, and it is thoroughly enjoyable to read.

...read moreread less

Abstract: This is the rst book on generalized linear models written by authors not mostly associated with the biological sciences. Subtitled “With Applications in Engineering and the Sciences,” this book’s authors all specialize primarily in engineering statistics. The rst author has produced several recent editions of Walpole, Myers, and Myers (1998), the last reported by Ziegel (1999). The second author has had several editions of Montgomery and Runger (1999), recently reported by Ziegel (2002). All of the authors are renowned experts in modeling. The rst two authors collaborated on a seminal volume in applied modeling (Myers and Montgomery 2002), which had its recent revised edition reported by Ziegel (2002). The last two authors collaborated on the most recent edition of a book on regression analysis (Montgomery, Peck, and Vining (2001), reported by Gray (2002), and the rst author has had multiple editions of his own regression analysis book (Myers 1990), the latest of which was reported by Ziegel (1991). A comparable book with similar objectives and a more speci c focus on logistic regression, Hosmer and Lemeshow (2000), reported by Conklin (2002), presumed a background in regression analysis and began with generalized linear models. The Preface here (p. xi) indicates an identical requirement but nonetheless begins with 100 pages of material on linear and nonlinear regression. Most of this will probably be a review for the readers of the book. Chapter 2, “Linear Regression Model,” begins with 50 pages of familiar material on estimation, inference, and diagnostic checking for multiple regression. The approach is very traditional, including the use of formal hypothesis tests. In industrial settings, use of p values as part of a risk-weighted decision is generally more appropriate. The pedagologic approach includes formulas and demonstrations for computations, although computing by Minitab is eventually illustrated. Less-familiar material on maximum likelihood estimation, scaled residuals, and weighted least squares provides more speci c background for subsequent estimation methods for generalized linear models. This review is not meant to be disparaging. The authors have packed a wealth of useful nuggets for any practitioner in this chapter. It is thoroughly enjoyable to read. Chapter 3, “Nonlinear Regression Models,” is arguably less of a review, because regression analysis courses often give short shrift to nonlinear models. The chapter begins with a great example on the pitfalls of linearizing a nonlinear model for parameter estimation. It continues with the effective balancing of explicit statements concerning the theoretical basis for computations versus the application and demonstration of their use. The details of maximum likelihood estimation are again provided, and weighted and generalized regression estimation are discussed. Chapter 4 is titled “Logistic and Poisson Regression Models.” Logistic regression provides the basic model for generalized linear models. The prior development for weighted regression is used to motivate maximum likelihood estimation for the parameters in the logistic model. The algebraic details are provided. As in the development for linear models, some of the details are pushed into an appendix. In addition to connecting to the foregoing material on regression on several occasions, the authors link their development forward to their following chapter on the entire family of generalized linear models. They discuss score functions, the variance-covariance matrix, Wald inference, likelihood inference, deviance, and overdispersion. Careful explanations are given for the values provided in standard computer software, here PROC LOGISTIC in SAS. The value in having the book begin with familiar regression concepts is clearly realized when the analogies are drawn between overdispersion and nonhomogenous variance, or analysis of deviance and analysis of variance. The authors rely on the similarity of Poisson regression methods to logistic regression methods and mostly present illustrations for Poisson regression. These use PROC GENMOD in SAS. The book does not give any of the SAS code that produces the results. Two of the examples illustrate designed experiments and modeling. They include discussion of subset selection and adjustment for overdispersion. The mathematic level of the presentation is elevated in Chapter 5, “The Family of Generalized Linear Models.” First, the authors unify the two preceding chapters under the exponential distribution. The material on the formal structure for generalized linear models (GLMs), likelihood equations, quasilikelihood, the gamma distribution family, and power functions as links is some of the most advanced material in the book. Most of the computational details are relegated to appendixes. A discussion of residuals returns one to a more practical perspective, and two long examples on gamma distribution applications provide excellent guidance on how to put this material into practice. One example is a contrast to the use of linear regression with a log transformation of the response, and the other is a comparison to the use of a different link function in the previous chapter. Chapter 6 considers generalized estimating equations (GEEs) for longitudinal and analogous studies. The rst half of the chapter presents the methodology, and the second half demonstrates its application through ve different examples. The basis for the general situation is rst established using the case with a normal distribution for the response and an identity link. The importance of the correlation structure is explained, the iterative estimation procedure is shown, and estimation for the scale parameters and the standard errors of the coef cients is discussed. The procedures are then generalized for the exponential family of distributions and quasi-likelihood estimation. Two of the examples are standard repeated-measures illustrations from biostatistical applications, but the last three illustrations are all interesting reworkings of industrial applications. The GEE computations in PROC GENMOD are applied to account for correlations that occur with multiple measurements on the subjects or restrictions to randomizations. The examples show that accounting for correlation structure can result in different conclusions. Chapter 7, “Further Advances and Applications in GLM,” discusses several additional topics. These are experimental designs for GLMs, asymptotic results, analysis of screening experiments, data transformation, modeling for both a process mean and variance, and generalized additive models. The material on experimental designs is more discursive than prescriptive and as a result is also somewhat theoretical. Similar comments apply for the discussion on the quality of the asymptotic results, which wallows a little too much in reports on various simulation studies. The examples on screening and data transformations experiments are again reworkings of analyses of familiar industrial examples and another obvious motivation for the enthusiasm that the authors have developed for using the GLM toolkit. One can hope that subsequent editions will similarly contain new examples that will have caused the authors to expand the material on generalized additive models and other topics in this chapter. Designating myself to review a book that I know I will love to read is one of the rewards of being editor. I read both of the editions of McCullagh and Nelder (1989), which was reviewed by Schuenemeyer (1992). That book was not fun to read. The obvious enthusiasm of Myers, Montgomery, and Vining and their reliance on their many examples as a major focus of their pedagogy make Generalized Linear Models a joy to read. Every statistician working in any area of applied science should buy it and experience the excitement of these new approaches to familiar activities.

...read moreread less

10,520 citations