EfficientL 1 regularized logistic regression

Home
/
Papers
/
EfficientL 1 regularized logistic regression

Proceedings Article•

EfficientL 1 regularized logistic regression

Sun-In Lee¹, Honglak Lee¹, Pieter Abbeel¹, Andrew Y. Ng¹•Institutions (1)

16 Jul 2006-pp 401-408

TL;DR: Theoretical results show that the proposed efficient algorithm for L1 regularized logistic regression is guaranteed to converge to the global optimum, and experiments show that it significantly outperforms standard algorithms for solving convex optimization problems.

read less

Abstract: L1 regularized logistic regression is now a workhorse of machine learning: it is widely used for many classification problems, particularly ones with many features. L1 regularized logistic regression requires solving a convex optimization problem. However, standard algorithms for solving convex optimization problems do not scale well enough to handle the large datasets encountered in many practical settings. In this paper, we propose an efficient algorithm for L1 regularized logistic regression. Our algorithm iteratively approximates the objective function by a quadratic approximation at the current point, while maintaining the L1 constraint. In each iteration, it uses the efficient LARS (Least Angle Regression) algorithm to solve the resulting L1 constrained quadratic optimization problem. Our theoretical results show that our algorithm is guaranteed to converge to the global optimum. Our experiments show that our algorithm significantly outperforms standard algorithms for solving convex optimization problems. Moreover, our algorithm outperforms four previously published algorithms that were specifically designed to solve the L1 regularized logistic regression problem.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Book Chapter•DOI•

Classification of RNA-seq Data

[...]

Kean Ming Tan¹, Ashley Petersen¹, Daniela Witten¹•Institutions (1)

University of Washington¹

01 Jan 2014

TL;DR: The use of, and modifications to, logistic regression, linear discriminant analysis, principal components analysis, partial least squares, and the support vector machine in the high-dimensional setting are discussed.

...read moreread less

Abstract: Next-generation sequencing technologies have made it possible to obtain, at a relatively low cost, a detailed snapshot of the RNA transcripts present in a tissue sample The resulting reads are usually binned by gene, exon, or other region of interest; thus the data typically amount to read counts for tens of thousands of features, on no more than dozens or hundreds of observations It is often of interest to use these data to develop a classifier in order to assign an observation to one of several pre-defined classes However, the high dimensionality of the data poses statistical challenges: because there are far more features than observations, many existing classification techniques cannot be directly applied In recent years, a number of proposals have been made to extend existing classification approaches to the high-dimensional setting In this chapter, we discuss the use of, and modifications to, logistic regression, linear discriminant analysis, principal components analysis, partial least squares, and the support vector machine in the high-dimensional setting We illustrate these methods on two RNA-sequencing data sets

...read moreread less

16 citations

Proceedings Article•DOI•

Learning dynamic temporal graphs for oil-production equipment monitoring system

[...]

Yan Liu¹, Jayant R. Kalagnanam¹, Oivind Johnsen¹•Institutions (1)

IBM¹

28 Jun 2009

TL;DR: A dynamic temporal graphical models based on hidden Markov model regression and lasso-type algorithms is developed, able to integrate two usually separate tasks, i.e. inferring underlying states and learning temporal graphs, in one unified model.

...read moreread less

Abstract: Learning temporal graph structures from time series data reveals important dependency relationships between current observations and histories. Most previous work focuses on learning and predicting with "static" temporal graphs only. However, in many applications such as mechanical systems and biology systems, the temporal dependencies might change over time. In this paper, we develop a dynamic temporal graphical models based on hidden Markov model regression and lasso-type algorithms. Our method is able to integrate two usually separate tasks, i.e. inferring underlying states and learning temporal graphs, in one unified model. The output temporal graphs provide better understanding about complex systems, i.e. how their dependency graphs evolve over time, and achieve more accurate predictions. We examine our model on two synthetic datasets as well as a real application dataset for monitoring oil-production equipment to capture different stages of the system, and achieve promising results.

...read moreread less

16 citations

Additional excerpts

...Following the idea in [14], we add a Laplacian prior for β as follows: P (β|λ) = (λ/2) exp(−λ‖β‖1)....
[...]

Journal Article•DOI•

Identifying diagnosis-specific genotype-phenotype associations via joint multitask sparse canonical correlation analysis and classification.

[...]

Lei Du¹, Fang Liu¹, Kefei Liu², Xiaohui Yao², Shannon L. Risacher³, Junwei Han¹, Lei Guo¹, Andrew J. Saykin³, Li Shen², Alzheimer’s Disease Neuroimaging Initiative - Show less +6 more•Institutions (3)

Northwestern Polytechnical University¹, University of Pennsylvania², Indiana University³

01 Jul 2020-Bioinformatics

TL;DR: A new joint multitask learning method, named MT–SCCALR, which absorbs the merits of both SCCA and logistic regression and yields better or similar canonical correlation coefficients and classification performances than two state-of-the-art methods.

...read moreread less

Abstract: Motivation Brain imaging genetics studies the complex associations between genotypic data such as single nucleotide polymorphisms (SNPs) and imaging quantitative traits (QTs). The neurodegenerative disorders usually exhibit the diversity and heterogeneity, originating from which different diagnostic groups might carry distinct imaging QTs, SNPs and their interactions. Sparse canonical correlation analysis (SCCA) is widely used to identify bi-multivariate genotype-phenotype associations. However, most existing SCCA methods are unsupervised, leading to an inability to identify diagnosis-specific genotype-phenotype associations. Results In this article, we propose a new joint multitask learning method, named MT-SCCALR, which absorbs the merits of both SCCA and logistic regression. MT-SCCALR learns genotype-phenotype associations of multiple tasks jointly, with each task focusing on identifying one diagnosis-specific genotype-phenotype pattern. Meanwhile, MT-SCCALR cannot only select relevant SNPs and imaging QTs for each diagnostic group alone, but also allows the selection of those shared by multiple diagnostic groups. We derive an efficient optimization algorithm whose convergence to a local optimum is guaranteed. Compared with two state-of-the-art methods, MT-SCCALR yields better or similar canonical correlation coefficients and classification performances. In addition, it owns much better discriminative canonical weight patterns of great interest than competitors. This demonstrates the power and capability of MTSCCAR in identifying diagnostically heterogeneous genotype-phenotype patterns, which would be helpful to understand the pathophysiology of brain disorders. Availability and implementation The software is publicly available at https://github.com/dulei323/MTSCCALR. Supplementary information Supplementary data are available at Bioinformatics online.

...read moreread less

16 citations

Journal Article•DOI•

Feature Selection via l 1 -Penalized Squared-Loss Mutual Information

[...]

Wittawat Jitkrittum¹, Hirotaka Hachiya¹, Masashi Sugiyama¹•Institutions (1)

Tokyo Institute of Technology¹

01 Jul 2013-IEICE Transactions on Information and Systems

TL;DR: L1-LSMI is proposed, an L1-regularization based algorithm that maximizes a squared-loss variant of mutual information between selected features and outputs that performs well in handling redundancy, detecting non-linear dependency, and considering feature interaction.

...read moreread less

Abstract: Feature selection is a technique to screen out less important features. Many existing supervised feature selection algorithms use redundancy and relevancy as the main criteria to select features. However, feature interaction, potentially a key characteristic in real-world problems, has not received much attention. As an attempt to take feature interaction into account, we propose L1-LSMI, an L1-regularization based algorithm that maximizes a squared-loss variant of mutual information between selected features and outputs. Numerical results show that L1-LSMI performs well in handling redundancy, detecting non-linear dependency, and considering feature interaction.

...read moreread less

16 citations

Additional excerpts

...(10), e.g., projected Newton-type methods [Lee et al., 2006, Schmidt et al., 2007]....
[...]

Proceedings Article•

A Method for Large-Scale l1-Regularized Logistic Regression.

[...]

Kwangmoo Koh, Seung-Jean Kim, Stephen Boyd

01 Jan 2007

TL;DR: Numerical experiments show that the efficient interior-point method described here outperforms standard methods for solving convex optimization problems as well as other methods specifically designed for l1- regularized LRPs.

...read moreread less

Abstract: Logistic regression with l1 regularization has been proposed as a promising method for feature selection in classification problems. Several specialized solution methods have been proposed for l1-regularized logistic regression problems (LRPs). However, existing methods do not scale well to large problems that arise in many practical settings. In this paper we describe an efficient interior-point method for solving l1-regularized LRPs. Small problems with up to a thousand or so features and examples can be solved in seconds on a PC. A variation on the basic method, that uses a preconditioned conjugate gradient method to compute the search step, can solve large sparse problems, with a million features and examples (e.g., the 20 Newsgroups data set), in a few tens of minutes, on a PC. Numerical experiments show that our method outperforms standard methods for solving convex optimization problems as well as other methods specifically designed for l1regularized LRPs. Introduction Logistic regression Let x ∈ R denote a vector of feature variables, and b ∈ {−1,+1} denote the associated binary output. In the logistic model, the conditional probability of b, given x, has the form Prob(b|x) = 1/(1 + exp (

...read moreread less

16 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
…
24
25
26
27
28
29
30
…
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Regression Shrinkage and Selection via the Lasso

[...]

Robert Tibshirani

01 Jan 1996-Journal of the royal statistical society series b-methodological

TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.

...read moreread less

Abstract: SUMMARY We propose a new method for estimation in linear models. The 'lasso' minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree-based models are briefly described.

...read moreread less

40,785 citations

"EfficientL 1 regularized logistic r..." refers methods in this paper

...(Tibshirani 1996) Several algorithms have been developed to solve L1 constrained least squares problems....
[...]
...See, Tibshirani (1996) for details.)...
[...]
...(Tibshirani 1996) Several algorithms have been developed to solve L1 constrained least squares problems....
[...]

Book•

Convex Optimization

[...]

Stephen Boyd¹, Lieven Vandenberghe²•Institutions (2)

Stanford University¹, University of California, Los Angeles²

01 Mar 2004

TL;DR: In this article, the focus is on recognizing convex optimization problems and then finding the most appropriate technique for solving them, and a comprehensive introduction to the subject is given. But the focus of this book is not on the optimization problem itself, but on the problem of finding the appropriate technique to solve it.

...read moreread less

Abstract: Convex optimization problems arise frequently in many different fields. A comprehensive introduction to the subject, this book shows in detail how such problems can be solved numerically with great efficiency. The focus is on recognizing convex optimization problems and then finding the most appropriate technique for solving them. The text contains many worked examples and homework exercises and will appeal to students, researchers and practitioners in fields such as engineering, computer science, mathematics, statistics, finance, and economics.

...read moreread less

33,341 citations

Book•

Generalized Linear Models

[...]

Peter McCullagh¹, John A. Nelder•Institutions (1)

Imperial College London¹

01 Jan 1983

TL;DR: In this paper, a generalization of the analysis of variance is given for these models using log- likelihoods, illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables), and gamma (variance components).

...read moreread less

Abstract: The technique of iterative weighted linear regression can be used to obtain maximum likelihood estimates of the parameters with observations distributed according to some exponential family and systematic effects that can be made linear by a suitable transformation. A generalization of the analysis of variance is given for these models using log- likelihoods. These generalized linear models are illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables) and gamma (variance components).

...read moreread less

23,215 citations

UCI Repository of machine learning databases

[...]

Catherine Blake

01 Jan 1998

12,940 citations

"EfficientL 1 regularized logistic r..." refers methods in this paper

...We tested each algorithm’s performance on 12 different datasets, consisting of 9 UCI datasets (Newman et al. 1998), one artificial dataset called Madelon from the NIPS 2003 workshop on feature extraction,3 and two gene expression datasets (Microarray 1 and 2).4 Table 2 gives details on the number…...
[...]
...We tested each algorithm’s performance on 12 different real datasets, consisting of 9 UCI datasets (Newman et al. 1998) and 3 gene expression datasets (Microarray 1, 2 and 3) 3....
[...]

Journal Article•DOI•

Generalized Linear Models

[...]

Eric R. Ziegel

01 Aug 2002-Technometrics

TL;DR: This is the rst book on generalized linear models written by authors not mostly associated with the biological sciences, and it is thoroughly enjoyable to read.

...read moreread less

Abstract: This is the rst book on generalized linear models written by authors not mostly associated with the biological sciences. Subtitled “With Applications in Engineering and the Sciences,” this book’s authors all specialize primarily in engineering statistics. The rst author has produced several recent editions of Walpole, Myers, and Myers (1998), the last reported by Ziegel (1999). The second author has had several editions of Montgomery and Runger (1999), recently reported by Ziegel (2002). All of the authors are renowned experts in modeling. The rst two authors collaborated on a seminal volume in applied modeling (Myers and Montgomery 2002), which had its recent revised edition reported by Ziegel (2002). The last two authors collaborated on the most recent edition of a book on regression analysis (Montgomery, Peck, and Vining (2001), reported by Gray (2002), and the rst author has had multiple editions of his own regression analysis book (Myers 1990), the latest of which was reported by Ziegel (1991). A comparable book with similar objectives and a more speci c focus on logistic regression, Hosmer and Lemeshow (2000), reported by Conklin (2002), presumed a background in regression analysis and began with generalized linear models. The Preface here (p. xi) indicates an identical requirement but nonetheless begins with 100 pages of material on linear and nonlinear regression. Most of this will probably be a review for the readers of the book. Chapter 2, “Linear Regression Model,” begins with 50 pages of familiar material on estimation, inference, and diagnostic checking for multiple regression. The approach is very traditional, including the use of formal hypothesis tests. In industrial settings, use of p values as part of a risk-weighted decision is generally more appropriate. The pedagologic approach includes formulas and demonstrations for computations, although computing by Minitab is eventually illustrated. Less-familiar material on maximum likelihood estimation, scaled residuals, and weighted least squares provides more speci c background for subsequent estimation methods for generalized linear models. This review is not meant to be disparaging. The authors have packed a wealth of useful nuggets for any practitioner in this chapter. It is thoroughly enjoyable to read. Chapter 3, “Nonlinear Regression Models,” is arguably less of a review, because regression analysis courses often give short shrift to nonlinear models. The chapter begins with a great example on the pitfalls of linearizing a nonlinear model for parameter estimation. It continues with the effective balancing of explicit statements concerning the theoretical basis for computations versus the application and demonstration of their use. The details of maximum likelihood estimation are again provided, and weighted and generalized regression estimation are discussed. Chapter 4 is titled “Logistic and Poisson Regression Models.” Logistic regression provides the basic model for generalized linear models. The prior development for weighted regression is used to motivate maximum likelihood estimation for the parameters in the logistic model. The algebraic details are provided. As in the development for linear models, some of the details are pushed into an appendix. In addition to connecting to the foregoing material on regression on several occasions, the authors link their development forward to their following chapter on the entire family of generalized linear models. They discuss score functions, the variance-covariance matrix, Wald inference, likelihood inference, deviance, and overdispersion. Careful explanations are given for the values provided in standard computer software, here PROC LOGISTIC in SAS. The value in having the book begin with familiar regression concepts is clearly realized when the analogies are drawn between overdispersion and nonhomogenous variance, or analysis of deviance and analysis of variance. The authors rely on the similarity of Poisson regression methods to logistic regression methods and mostly present illustrations for Poisson regression. These use PROC GENMOD in SAS. The book does not give any of the SAS code that produces the results. Two of the examples illustrate designed experiments and modeling. They include discussion of subset selection and adjustment for overdispersion. The mathematic level of the presentation is elevated in Chapter 5, “The Family of Generalized Linear Models.” First, the authors unify the two preceding chapters under the exponential distribution. The material on the formal structure for generalized linear models (GLMs), likelihood equations, quasilikelihood, the gamma distribution family, and power functions as links is some of the most advanced material in the book. Most of the computational details are relegated to appendixes. A discussion of residuals returns one to a more practical perspective, and two long examples on gamma distribution applications provide excellent guidance on how to put this material into practice. One example is a contrast to the use of linear regression with a log transformation of the response, and the other is a comparison to the use of a different link function in the previous chapter. Chapter 6 considers generalized estimating equations (GEEs) for longitudinal and analogous studies. The rst half of the chapter presents the methodology, and the second half demonstrates its application through ve different examples. The basis for the general situation is rst established using the case with a normal distribution for the response and an identity link. The importance of the correlation structure is explained, the iterative estimation procedure is shown, and estimation for the scale parameters and the standard errors of the coef cients is discussed. The procedures are then generalized for the exponential family of distributions and quasi-likelihood estimation. Two of the examples are standard repeated-measures illustrations from biostatistical applications, but the last three illustrations are all interesting reworkings of industrial applications. The GEE computations in PROC GENMOD are applied to account for correlations that occur with multiple measurements on the subjects or restrictions to randomizations. The examples show that accounting for correlation structure can result in different conclusions. Chapter 7, “Further Advances and Applications in GLM,” discusses several additional topics. These are experimental designs for GLMs, asymptotic results, analysis of screening experiments, data transformation, modeling for both a process mean and variance, and generalized additive models. The material on experimental designs is more discursive than prescriptive and as a result is also somewhat theoretical. Similar comments apply for the discussion on the quality of the asymptotic results, which wallows a little too much in reports on various simulation studies. The examples on screening and data transformations experiments are again reworkings of analyses of familiar industrial examples and another obvious motivation for the enthusiasm that the authors have developed for using the GLM toolkit. One can hope that subsequent editions will similarly contain new examples that will have caused the authors to expand the material on generalized additive models and other topics in this chapter. Designating myself to review a book that I know I will love to read is one of the rewards of being editor. I read both of the editions of McCullagh and Nelder (1989), which was reviewed by Schuenemeyer (1992). That book was not fun to read. The obvious enthusiasm of Myers, Montgomery, and Vining and their reliance on their many examples as a major focus of their pedagogy make Generalized Linear Models a joy to read. Every statistician working in any area of applied science should buy it and experience the excitement of these new approaches to familiar activities.

...read moreread less

10,520 citations