EfficientL 1 regularized logistic regression

Home
/
Papers
/
EfficientL 1 regularized logistic regression

Proceedings Article•

EfficientL 1 regularized logistic regression

Sun-In Lee¹, Honglak Lee¹, Pieter Abbeel¹, Andrew Y. Ng¹•Institutions (1)

16 Jul 2006-pp 401-408

TL;DR: Theoretical results show that the proposed efficient algorithm for L1 regularized logistic regression is guaranteed to converge to the global optimum, and experiments show that it significantly outperforms standard algorithms for solving convex optimization problems.

read less

Abstract: L1 regularized logistic regression is now a workhorse of machine learning: it is widely used for many classification problems, particularly ones with many features. L1 regularized logistic regression requires solving a convex optimization problem. However, standard algorithms for solving convex optimization problems do not scale well enough to handle the large datasets encountered in many practical settings. In this paper, we propose an efficient algorithm for L1 regularized logistic regression. Our algorithm iteratively approximates the objective function by a quadratic approximation at the current point, while maintaining the L1 constraint. In each iteration, it uses the efficient LARS (Least Angle Regression) algorithm to solve the resulting L1 constrained quadratic optimization problem. Our theoretical results show that our algorithm is guaranteed to converge to the global optimum. Our experiments show that our algorithm significantly outperforms standard algorithms for solving convex optimization problems. Moreover, our algorithm outperforms four previously published algorithms that were specifically designed to solve the L1 regularized logistic regression problem.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

"Why Should I Trust You?": Explaining the Predictions of Any Classifier

[...]

Marco Tulio Ribeiro¹, Sameer Singh¹, Carlos Guestrin¹•Institutions (1)

University of Washington¹

13 Aug 2016

TL;DR: In this article, the authors propose LIME, a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem.

...read moreread less

Abstract: Despite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust, which is fundamental if one plans to take action based on a prediction, or when choosing whether to deploy a new model. Such understanding also provides insights into the model, which can be used to transform an untrustworthy model or prediction into a trustworthy one. In this work, we propose LIME, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally varound the prediction. We also propose a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks). We show the utility of explanations via novel experiments, both simulated and with human subjects, on various scenarios that require trust: deciding if one should trust a prediction, choosing between models, improving an untrustworthy classifier, and identifying why a classifier should not be trusted.

...read moreread less

11,104 citations

Convex Analysisの二,三の進展について

[...]

徹丸山

01 Feb 1977

5,933 citations

Journal Article•DOI•

Genome-wide association analysis by lasso penalized logistic regression

[...]

Tong Tong Wu¹, Yi Fang Chen², Trevor Hastie², Eric M. Sobel², Kenneth Lange² - Show less +1 more•Institutions (2)

University of Maryland, College Park¹, University of California, Los Angeles²

01 Mar 2009-Bioinformatics

TL;DR: The performance of lasso penalized logistic regression in case-control disease gene mapping with a large number of SNPs (single nucleotide polymorphisms) predictors is evaluated and coeliac disease results replicate the previous SNP results and shed light on possible interactions among the SNPs.

...read moreread less

Abstract: Motivation: In ordinary regression, imposition of a lasso penalty makes continuous model selection straightforward. Lasso penalized regression is particularly advantageous when the number of predictors far exceeds the number of observations. Method: The present article evaluates the performance of lasso penalized logistic regression in case–control disease gene mapping with a large number of SNPs (single nucleotide polymorphisms) predictors. The strength of the lasso penalty can be tuned to select a predetermined number of the most relevant SNPs and other predictors. For a given value of the tuning constant, the penalized likelihood is quickly maximized by cyclic coordinate ascent. Once the most potent marginal predictors are identified, their two-way and higher order interactions can also be examined by lasso penalized logistic regression. Results: This strategy is tested on both simulated and real data. Our findings on coeliac disease replicate the previous SNP results and shed light on possible interactions among the SNPs. Availability: The software discussed is available in Mendel 9.0 at the UCLA Human Genetics web site. Contact: klange@ucla.edu Supplementary information: Supplementary data are available at Bioinformatics online.

...read moreread less

784 citations

Proceedings Article•DOI•

Scalable training of L1-regularized log-linear models

[...]

Galen Andrew¹, Jianfeng Gao¹•Institutions (1)

Microsoft¹

20 Jun 2007

TL;DR: This work presents an algorithm Orthant-Wise Limited-memory Quasi-Newton (OWL-QN), based on L-BFGS, that can efficiently optimize the L1-regularized log-likelihood of log-linear models with millions of parameters.

...read moreread less

Abstract: The L-BFGS limited-memory quasi-Newton method is the algorithm of choice for optimizing the parameters of large-scale log-linear models with L2 regularization, but it cannot be used for an L1-regularized loss due to its non-differentiability whenever some parameter is zero. Efficient algorithms have been proposed for this task, but they are impractical when the number of parameters is very large. We present an algorithm Orthant-Wise Limited-memory Quasi-Newton (OWL-QN), based on L-BFGS, that can efficiently optimize the L1-regularized log-likelihood of log-linear models with millions of parameters. In our experiments on a parse reranking task, our algorithm was several orders of magnitude faster than an alternative algorithm, and substantially faster than L-BFGS on the analogous L2-regularized problem. We also present a proof that OWL-QN is guaranteed to converge to a globally optimal parameter vector.

...read moreread less

612 citations

Cites methods from "EfficientL 1 regularized logistic r..."

...Lee et al. (2006) propose the algorithm irls-lars, inspired by Newton’s method, which iteratively minimizes the function’s second order Taylor expansion, subject to linear constraints....
[...]
...Lee et al. (2006) propose the algorithm irls-lars, inspired by Newton s method, which iteratively minimizes the function s second order Taylor expansion, subject to linear constraints....
[...]

An Interior-Point Method for Large-Scale '1-Regularized Logistic Regression

[...]

Kwangmoo Koh, Seung-Jean Kim, Stephen Boyd

01 Jan 2007

TL;DR: In this article, an efficient interior-point method for solving large-scale 1-regularized logistic regression problems is described. But the method is not suitable for large scale problems, such as the 20 Newsgroups data set.

...read moreread less

Abstract: Logistic regression with ‘1 regularization has been proposed as a promising method for feature selection in classification problems. In this paper we describe an efficient interior-point method for solving large-scale ‘1-regularized logistic regression problems. Small problems with up to a thousand or so features and examples can be solved in seconds on a PC; medium sized problems, with tens of thousands of features and examples, can be solved in tens of seconds (assuming some sparsity in the data). A variation on the basic method, that uses a preconditioned conjugate gradient method to compute the search step, can solve very large problems, with a million features and examples (e.g., the 20 Newsgroups data set), in a few minutes, on a PC. Using warm-start techniques, a good approximation of the entire regularization path can be computed much more efficiently than by solving a family of problems independently.

...read moreread less

596 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

On the LASSO and its dual

[...]

Michael R. Osborne¹, Brett Presnell², Berwin A. Turlach³•Institutions (3)

Australian National University¹, University of Florida², University of Western Australia³

01 Jun 2000-Journal of Computational and Graphical Statistics

TL;DR: Consideration of the primal and dual problems together leads to important new insights into the characteristics of the LASSO estimator and to an improved method for estimating its covariance matrix.

...read moreread less

Abstract: Proposed by Tibshirani, the least absolute shrinkage and selection operator (LASSO) estimates a vector of regression coefficients by minimizing the residual sum of squares subject to a constraint on the l 1-norm of the coefficient vector. The LASSO estimator typically has one or more zero elements and thus shares characteristics of both shrinkage estimation and variable selection. In this article we treat the LASSO as a convex programming problem and derive its dual. Consideration of the primal and dual problems together leads to important new insights into the characteristics of the LASSO estimator and to an improved method for estimating its covariance matrix. Using these results we also develop an efficient algorithm for computing LASSO estimates which is usable even in cases where the number of regressors exceeds the number of observations. An S-Plus library based on this algorithm is available from StatLib.

...read moreread less

763 citations

"EfficientL 1 regularized logistic r..." refers methods in this paper

...Roth (2004) proposed an algorithm called generalized LASSO that extends a LASSO algorithm proposed by Osborne et al. (2000)....
[...]
...Roth (2004) proposed an algorithm called generalized LASSO that extends a LASSO algorithm proposed by Osborne et al. (2000). (The LASSO refers to an L1 regularized...
[...]

Journal Article•DOI•

Iteratively Reweighted Least Squares for Maximum Likelihood Estimation, and Some Robust and Resistant Alternatives

[...]

Peter H.R. Green¹•Institutions (1)

Durham University¹

01 Jan 1984-Journal of the royal statistical society series b-methodological

TL;DR: The scope of application of iteratively reweighted least squares to statistical estimation problems is considerably wider than is generally appreciated as mentioned in this paper, and it extends beyond the exponential-family-type generalized linear models to other distributions, to non-linear parameterizations, and to dependent observations.

...read moreread less

Abstract: The scope of application of iteratively reweighted least squares to statistical estimation problems is considerably wider than is generally appreciated. It extends beyond the exponential-family-type generalized linear models to other distributions, to non-linear parameterizations, and to dependent observations. Various criteria for estimation other than maximum likelihood, including resistant alternatives, may be used. The algorithms are generally numerically stable, easily programmed without he aid of packages, and highly suited to interactive computation.

...read moreread less

586 citations

"EfficientL 1 regularized logistic r..." refers background or methods in this paper

...(Green 1984; Minka 2003) IRLS reformulates the problem of finding the step direction for Newton’s method as a weighted ordinary least squares problem....
[...]
...(See Green 1984, or Minka 2003 for details of this derivation.)...
[...]
...In particular, our algorithm can be used for parameter learning for L1 constrained generalized linear models....
[...]
...In the k’th iteration, it finds a step direction γ(k) by solving the constrained least squares problem of Equation (11)....
[...]

A comparison of numerical optimizers for logistic regression

[...]

Tom Minka

01 Jan 2004

TL;DR: This note compares eight different algorithms for computing the maximum a-posteriori parameter estimate and finds the fastest algorithms turn out to be conjugate gradient ascent and quasi-Newton algorithms, which far outstrip Iterative Scaling and its variants.

...read moreread less

Abstract: Logistic regression is a workhorse of statistics and is closely related to method s used in Machine Learning, including the Perceptron and the Support Vector Machine. This note compares eight different algorithms for computing the maximum a-posteriori parameter estimate. A full derivation of each algorithm is given. In particular, a new derivation of Iterative Scaling is given which applies more generally than the conventional one. A new derivation is also given for the Modified Iterative Scaling algorithm of Collins et al. (2002). Most of the alg orithms operate in the primal space, but can also work in dual space. All algorithms are compared in terms of computational complexity by experiments on large data sets. The fastest algorithms turn out to be conjugate gradient ascent and quasi-Newton algorithms, which far outstrip Iterative Scaling and its variants.

...read moreread less

290 citations

"EfficientL 1 regularized logistic r..." refers background or methods in this paper

...(Green 1984; Minka 2003) IRLS reformulates the problem of finding the step direction for Newton’s method as a weighted ordinary least squares problem....
[...]
...(See Green 1984, or Minka 2003 for details of this derivation.)...
[...]

Journal Article•DOI•

The generalized LASSO

[...]

Volker Roth¹•Institutions (1)

University of Bonn¹

01 Jan 2004-IEEE Transactions on Neural Networks

TL;DR: This paper presents a different class of kernel regressors that effectively overcome the above problems, and presents a highly efficient algorithm with guaranteed global convergence that defies a unique framework for sparse regression models in the very rich class of IRLS models.

...read moreread less

Abstract: In the last few years, the support vector machine (SVM) method has motivated new interest in kernel regression techniques. Although the SVM has been shown to exhibit excellent generalization properties in many experiments, it suffers from several drawbacks, both of a theoretical and a technical nature: the absence of probabilistic outputs, the restriction to Mercer kernels, and the steep growth of the number of support vectors with increasing size of the training set. In this paper, we present a different class of kernel regressors that effectively overcome the above problems. We call this approach generalized LASSO regression. It has a clear probabilistic interpretation, can handle learning sets that are corrupted by outliers, produces extremely sparse solutions, and is capable of dealing with large-scale problems. For regression functionals which can be modeled as iteratively reweighted least-squares (IRLS) problems, we present a highly efficient algorithm with guaranteed global convergence. This defies a unique framework for sparse regression models in the very rich class of IRLS models, including various types of robust regression models and logistic regression. Performance studies for many standard benchmark datasets effectively demonstrate the advantages of this model over related approaches.

...read moreread less

281 citations

"EfficientL 1 regularized logistic r..." refers methods in this paper

...Roth (2004) proposed an algorithm called generalized LASSO that extends a LASSO algorithm proposed by Osborne et al. (2000)....
[...]
...Experimental details on the other algorithms We compared our algorithm (IRLS-LARS) to four previously published algorithms: Grafting (Perkins & Theiler 2003), Generalized LASSO (Roth 2004), SCGIS (Goodman 2004), and Gl1ce (Lokhorst 1999)....
[...]

Proceedings Article•

Online feature selection using grafting

[...]

Simon Perkins¹, James Theiler¹•Institutions (1)

Los Alamos National Laboratory¹

21 Aug 2003

TL;DR: It is argued that existing feature selection methods do not perform well in this scenario, and a promising alternative method is described, based on a stagewise gradient descent technique which is called grafting.

...read moreread less

Abstract: In the standard feature selection problem, we are given a fixed set of candidate features for use in a learning problem, and must select a subset that will be used to train a model that is "as good as possible" according to some criterion. In this paper, we present an interesting and useful variant, the online feature selection problem, in which, instead of all features being available from the start, features arrive one at a time. The learner's task is to select a subset of features and return a corresponding model at each time step which is as good as possible given the features seen so far. We argue that existing feature selection methods do not perform well in this scenario, and describe a promising alternative method, based on a stagewise gradient descent technique which we call grafting.

...read moreread less

211 citations

"EfficientL 1 regularized logistic r..." refers methods in this paper

...Figure 1 shows the results for the five algorithms specifically designed for L1 regularized logistic regression (IRLSLARS, Grafting, SCGIS, GenLASSO and Gl1ce) on 12 datasets....
[...]
...More specifically, in 8 (out of 12) datasets our method was more than 8 times faster than Grafting....
[...]
...For example, for the algorithms that are based on conjugate gradient and Newton’s method, we extensively tuned the parameters of the line-search algorithm; for the algorithms using conjugate gradient (Grafting, CG-epsL1, CG-Huber and CGL1), we tested both our own conjugate gradient implementation as well as the MATLAB optimization toolbox’s conjugate gradient; for the algorithms that use the approximate L1 penalty term, we tried many choices for in the range (10−15 < < 0.01) and chose the one with the shortest running time; etc....
[...]
...Grafting uses a local derivative test in each iteration of the conjugate gradient method, to choose an additional feature that is allowed to differ from zero....
[...]
...Experimental details on the other algorithms We compared our algorithm (IRLS-LARS) to four previously published algorithms: Grafting (Perkins & Theiler 2003), Generalized LASSO (Roth 2004), SCGIS (Goodman 2004), and Gl1ce (Lokhorst 1999)....
[...]