EfficientL 1 regularized logistic regression

Home
/
Papers
/
EfficientL 1 regularized logistic regression

Proceedings Article•

EfficientL 1 regularized logistic regression

Sun-In Lee¹, Honglak Lee¹, Pieter Abbeel¹, Andrew Y. Ng¹•Institutions (1)

16 Jul 2006-pp 401-408

TL;DR: Theoretical results show that the proposed efficient algorithm for L1 regularized logistic regression is guaranteed to converge to the global optimum, and experiments show that it significantly outperforms standard algorithms for solving convex optimization problems.

read less

Abstract: L1 regularized logistic regression is now a workhorse of machine learning: it is widely used for many classification problems, particularly ones with many features. L1 regularized logistic regression requires solving a convex optimization problem. However, standard algorithms for solving convex optimization problems do not scale well enough to handle the large datasets encountered in many practical settings. In this paper, we propose an efficient algorithm for L1 regularized logistic regression. Our algorithm iteratively approximates the objective function by a quadratic approximation at the current point, while maintaining the L1 constraint. In each iteration, it uses the efficient LARS (Least Angle Regression) algorithm to solve the resulting L1 constrained quadratic optimization problem. Our theoretical results show that our algorithm is guaranteed to converge to the global optimum. Our experiments show that our algorithm significantly outperforms standard algorithms for solving convex optimization problems. Moreover, our algorithm outperforms four previously published algorithms that were specifically designed to solve the L1 regularized logistic regression problem.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Smart additive manufacturing empowered by a closed-loop machine learning algorithm

[...]

Nariman Razaviarab, Safura Sharifi¹, Yaser M. Banadaki•Institutions (1)

Louisiana State University¹

27 Mar 2019

TL;DR: A closed-loop machine learning algorithm is proposed as a promising way of improving the underlying failure phenomena in 3D metal printing by automatically detect the defects in printing the layers, thereby turning metal 3D printers into essentially their own inspectors.

...read moreread less

Abstract: Additive manufacturing (AM) is a crucial component of smart manufacturing systems that disrupts traditional supply chains. However, the parts built using the state-of-the-art powder-bed 3D printers have noticeable unpredictable mechanical properties. In this paper, we propose a closed-loop machine learning algorithm as a promising way of improving the underlying failure phenomena in 3D metal printing. We employ machine learning approach through a Deep Convolutional Neural Network to automatically detect the defects in printing the layers, thereby turning metal 3D printers into essentially their own inspectors. By comparing three deep learning models, we demonstrate that transfer learning approach based on Inception-v3 model in Tensorflow framework can be used to retrain our images data set consisting of only 200 image samples and achieves a classification accuracy rate of 100 % on the test set. This will generate a precise feedback signal for a smart 3D printer to recognize any issues with the build itself and make proper adjustments and corrections without operator intervention. The closed-loop ML algorithm can enhance the quality of the AM process, leading to manufacturing better parts with fewer quality hiccups, limiting waste of time and materials.

...read moreread less

13 citations

Cites methods from "EfficientL 1 regularized logistic r..."

...The DCNN models are trained by simple logistic regression [32] and Inception v3....
[...]

Journal Article•DOI•

Automatic prior shape selection for image edge detection with modified Mumford–Shah model

[...]

Yuying Shi¹, Zhimei Huo¹, Jing Qin², Yilin Li•Institutions (2)

North China Electric Power University¹, University of Kentucky²

15 Mar 2020-Computers & Mathematics With Applications

TL;DR: A novel variational model to automatically and adaptively detect one or more prior shapes from the given dictionary to guide the edge detection process and an efficient algorithm based on the Alternating Direction Method of Multipliers is proposed to solve this model with guaranteed convergence.

...read moreread less

Abstract: Edge detection plays an important role in the field of image processing. In this paper, we propose a novel variational model to automatically and adaptively detect one or more prior shapes from the given dictionary to guide the edge detection process. In that way, we can effectively detect the shapes of interest from the test image. Moreover, an efficient algorithm based on the Alternating Direction Method of Multipliers (ADMM) is proposed to solve this model with guaranteed convergence. A variety of numerical experiments show that the proposed method has achieved ideal performance for edge detection in images with missing information, various types of noise and complicated background, and even multiple objects.

...read moreread less

12 citations

Proceedings Article•DOI•

Recommendation Systems for Markets with Two Sided Preferences

[...]

Anjan Goswami¹, Fares Hedayati², Prasant Mohapatra¹•Institutions (2)

University of California, Davis¹, University of California, Berkeley²

03 Dec 2014

TL;DR: It is shown that for markets with two sided preferences, one can improve the AUC (Area Under the receiver operator Curve) score by considering separate models for preferences of both the sides and constructing a two layer architecture for ranking.

...read moreread less

Abstract: In recent times we have witnessed the emergence of large online markets with two-sided preferences that are responsible for businesses worth billions of dollars. Recommendation systems are critical components of such markets. It is to be noted that the matching in such a market depends on the preferences of both sides, consequently, the construction of a recommendation system for such a market calls for consideration of preferences of both sides. The online dating market, and the online freelancer market are examples of markets with two-sided preferences. Recommendation systems for such markets are fundamentally different from typical rating based product recommendations. We pose this problem as a bipartite ranking problem. There has been extensive research on bipartite ranking algorithms. Typically, generalized linear regression models are popular methods of constructing such ranking on account of their ability to be learned easily from big data, and their computational simplicity on engineering platforms. However, we show that for markets with two sided preferences, one can improve the AUC (Area Under the receiver operator Curve) score by considering separate models for preferences of both the sides and constructing a two layer architecture for ranking. We call this a two-level model algorithm. For both synthetic and real data we show that the two-level model algorithm has a better AUC performance than the direct application of a generalized linear model such as L 1 logistic regression or an ensemble method such as random forest algorithm. We provide a theoretical justification of AUC optimality of two-level model and pose a theoretical problem for a more general result.

...read moreread less

12 citations

Cites background or methods or result from "EfficientL 1 regularized logistic r..."

...We show that this simple two-level model results in a 2 to more than 10% higher AUC (area under the curve) score and similar gains in TPR (true positive rate) and TNR (true negative rate) compared to direct regression based estimation of probability of matching using L1 regularized logistic regression [4] or random forest [5]....
[...]
...In the first level, we estimate the probabilities P ((y = 1)|x̄) and P ((y′ = 1)|x̄) using a L1-LR (l1 regularized logistic regression) [4]....
[...]
...• L1 minimization tends to give sparse solutions and has logarithmic sample complexity bounds [4]....
[...]

Dissertation•

Contribution à la sélection de modèle via pénalisation Lasso en Épidémiologie

[...]

Marta Avalos Fernandez¹•Institutions (1)

University of Bordeaux¹

11 Dec 2018

TL;DR: In this article, the authors present an introduction to Big Data in the context of epidemiologie and epidemiology, and present an exemple of Big Data using the Systeme National des Donnees de Sante.

...read moreread less

Abstract: Mes travaux portent principalement sur le developpement, l’adaptation, l’implementation et l’application de methodes statistiques de selection de modele. Ma principale contribution consiste a adapter des methodes de l'apprentissage statistique supervise qui sont devenues tres populaires lors de la derniere decennie, les regressions penalisees de type Lasso, a l'analyse de donnees issues d'etudes epidemiologiques. L'enjeu est de s'attaquer aux problemes des donnees volumineuses (\textit{Big Data}) tout en respectant les objectifs et specificites de la discipline. Le volume important se refere ici au fait que le nombre d'observations et/ou le nombre de variables est bien plus important que celui qui etait classique dans le domaine, sans exclure le cas ou le nombre de variables est superieur au nombre d'observations (donnees de grande dimension). Le contexte de la pratique epidemiologique est en plein changement avec les evolutions technologiques et la consequente disponibilite croissante des Big Data. Le Systeme National des Donnees de Sante (SNDS), regroupant les principales bases de donnees de sante publique existantes en France, constitue un exemple de Big Data en sante. Le donnees ``omiques'' (genomiques, transcriptomiques, proteomiques, metabolomiques, microbiomiques, mycobiomiques, viromiques,$\ldots$) issues des avancees des techniques de sequencage a haut debit constituent un autre exemple de Big Data en sante. Enfin, les mesures de l'\textit{exposome} (par opposition aux facteurs genetiques), qui designe en epidemiologie l’ensemble des expositions environnementales que subit un individu au long de sa vie peut egalement constituer une source de Big Data. Ce document s'articule autour de trois chapitres. Il resume mon activite de recherche depuis 2005, soit depuis mon recrutement a l’Universite de Bordeaux apres ma these. Le premier chapitre est une introduction generale dans laquelle je contextualise, motive et enonce la problematique abordee tout au long de mes recherches. Le deuxieme chapitre est consacre a mes travaux en lien avec les etudes sur les traumatismes accidentels et expositions medicamenteuses a partir des donnees du SNDS. Le troisieme chapitre est consacre a mes travaux en lien avec des etudes biomedicales: la prediction de la charge virale censuree par un seuil de detection a partir des mutations du VIH, d'une part, et l'automatisation de la detection des seuils d'anomalie des hemogrammes en population generale, d'autre part.

...read moreread less

12 citations

Proceedings Article•DOI•

The impact of images on user clicks in product search

[...]

Sung Hwan Chung¹, Anjan Goswami¹, Honglak Lee², Junling Hu¹•Institutions (2)

eBay¹, University of Michigan²

12 Aug 2012

TL;DR: This paper proposes adding information extracted from the thumbnail image of the item as additional features for click prediction and uses two types of image features -- photographic features and object features.

...read moreread less

Abstract: Product search engine faces unique challenges that differ from web page search. The goal of a product search engine is to rank relevant items that the user may be interested in purchasing. Clicks provide a strong signal of a user's interest in an item. Traditional click prediction models include many features such as document text, price, and user information. In this paper, we propose adding information extracted from the thumbnail image of the item as additional features for click prediction. Specifically, we use two types of image features -- photographic features and object features. Our experiments reveal that both types of features can be highly useful in click prediction. We measure our performance in both prediction accuracy and NDCG. Overall, our experiments show that augmenting with image features to a standard model in click prediction provides significant improvement in precision and recall and boosts NDCG.

...read moreread less

12 citations

Cites methods from "EfficientL 1 regularized logistic r..."

...To address this problem, we used the L1 regularized logistic regression [27] to perform feature selection....
[...]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
…
29
30
31
32
33
34
35
…
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Regression Shrinkage and Selection via the Lasso

[...]

Robert Tibshirani

01 Jan 1996-Journal of the royal statistical society series b-methodological

TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.

...read moreread less

Abstract: SUMMARY We propose a new method for estimation in linear models. The 'lasso' minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree-based models are briefly described.

...read moreread less

40,785 citations

"EfficientL 1 regularized logistic r..." refers methods in this paper

...(Tibshirani 1996) Several algorithms have been developed to solve L1 constrained least squares problems....
[...]
...See, Tibshirani (1996) for details.)...
[...]
...(Tibshirani 1996) Several algorithms have been developed to solve L1 constrained least squares problems....
[...]

Book•

Convex Optimization

[...]

Stephen Boyd¹, Lieven Vandenberghe²•Institutions (2)

Stanford University¹, University of California, Los Angeles²

01 Mar 2004

TL;DR: In this article, the focus is on recognizing convex optimization problems and then finding the most appropriate technique for solving them, and a comprehensive introduction to the subject is given. But the focus of this book is not on the optimization problem itself, but on the problem of finding the appropriate technique to solve it.

...read moreread less

Abstract: Convex optimization problems arise frequently in many different fields. A comprehensive introduction to the subject, this book shows in detail how such problems can be solved numerically with great efficiency. The focus is on recognizing convex optimization problems and then finding the most appropriate technique for solving them. The text contains many worked examples and homework exercises and will appeal to students, researchers and practitioners in fields such as engineering, computer science, mathematics, statistics, finance, and economics.

...read moreread less

33,341 citations

Book•

Generalized Linear Models

[...]

Peter McCullagh¹, John A. Nelder•Institutions (1)

Imperial College London¹

01 Jan 1983

TL;DR: In this paper, a generalization of the analysis of variance is given for these models using log- likelihoods, illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables), and gamma (variance components).

...read moreread less

Abstract: The technique of iterative weighted linear regression can be used to obtain maximum likelihood estimates of the parameters with observations distributed according to some exponential family and systematic effects that can be made linear by a suitable transformation. A generalization of the analysis of variance is given for these models using log- likelihoods. These generalized linear models are illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables) and gamma (variance components).

...read moreread less

23,215 citations

UCI Repository of machine learning databases

[...]

Catherine Blake

01 Jan 1998

12,940 citations

"EfficientL 1 regularized logistic r..." refers methods in this paper

...We tested each algorithm’s performance on 12 different datasets, consisting of 9 UCI datasets (Newman et al. 1998), one artificial dataset called Madelon from the NIPS 2003 workshop on feature extraction,3 and two gene expression datasets (Microarray 1 and 2).4 Table 2 gives details on the number…...
[...]
...We tested each algorithm’s performance on 12 different real datasets, consisting of 9 UCI datasets (Newman et al. 1998) and 3 gene expression datasets (Microarray 1, 2 and 3) 3....
[...]

Journal Article•DOI•

Generalized Linear Models

[...]

Eric R. Ziegel

01 Aug 2002-Technometrics

TL;DR: This is the rst book on generalized linear models written by authors not mostly associated with the biological sciences, and it is thoroughly enjoyable to read.

...read moreread less

Abstract: This is the rst book on generalized linear models written by authors not mostly associated with the biological sciences. Subtitled “With Applications in Engineering and the Sciences,” this book’s authors all specialize primarily in engineering statistics. The rst author has produced several recent editions of Walpole, Myers, and Myers (1998), the last reported by Ziegel (1999). The second author has had several editions of Montgomery and Runger (1999), recently reported by Ziegel (2002). All of the authors are renowned experts in modeling. The rst two authors collaborated on a seminal volume in applied modeling (Myers and Montgomery 2002), which had its recent revised edition reported by Ziegel (2002). The last two authors collaborated on the most recent edition of a book on regression analysis (Montgomery, Peck, and Vining (2001), reported by Gray (2002), and the rst author has had multiple editions of his own regression analysis book (Myers 1990), the latest of which was reported by Ziegel (1991). A comparable book with similar objectives and a more speci c focus on logistic regression, Hosmer and Lemeshow (2000), reported by Conklin (2002), presumed a background in regression analysis and began with generalized linear models. The Preface here (p. xi) indicates an identical requirement but nonetheless begins with 100 pages of material on linear and nonlinear regression. Most of this will probably be a review for the readers of the book. Chapter 2, “Linear Regression Model,” begins with 50 pages of familiar material on estimation, inference, and diagnostic checking for multiple regression. The approach is very traditional, including the use of formal hypothesis tests. In industrial settings, use of p values as part of a risk-weighted decision is generally more appropriate. The pedagologic approach includes formulas and demonstrations for computations, although computing by Minitab is eventually illustrated. Less-familiar material on maximum likelihood estimation, scaled residuals, and weighted least squares provides more speci c background for subsequent estimation methods for generalized linear models. This review is not meant to be disparaging. The authors have packed a wealth of useful nuggets for any practitioner in this chapter. It is thoroughly enjoyable to read. Chapter 3, “Nonlinear Regression Models,” is arguably less of a review, because regression analysis courses often give short shrift to nonlinear models. The chapter begins with a great example on the pitfalls of linearizing a nonlinear model for parameter estimation. It continues with the effective balancing of explicit statements concerning the theoretical basis for computations versus the application and demonstration of their use. The details of maximum likelihood estimation are again provided, and weighted and generalized regression estimation are discussed. Chapter 4 is titled “Logistic and Poisson Regression Models.” Logistic regression provides the basic model for generalized linear models. The prior development for weighted regression is used to motivate maximum likelihood estimation for the parameters in the logistic model. The algebraic details are provided. As in the development for linear models, some of the details are pushed into an appendix. In addition to connecting to the foregoing material on regression on several occasions, the authors link their development forward to their following chapter on the entire family of generalized linear models. They discuss score functions, the variance-covariance matrix, Wald inference, likelihood inference, deviance, and overdispersion. Careful explanations are given for the values provided in standard computer software, here PROC LOGISTIC in SAS. The value in having the book begin with familiar regression concepts is clearly realized when the analogies are drawn between overdispersion and nonhomogenous variance, or analysis of deviance and analysis of variance. The authors rely on the similarity of Poisson regression methods to logistic regression methods and mostly present illustrations for Poisson regression. These use PROC GENMOD in SAS. The book does not give any of the SAS code that produces the results. Two of the examples illustrate designed experiments and modeling. They include discussion of subset selection and adjustment for overdispersion. The mathematic level of the presentation is elevated in Chapter 5, “The Family of Generalized Linear Models.” First, the authors unify the two preceding chapters under the exponential distribution. The material on the formal structure for generalized linear models (GLMs), likelihood equations, quasilikelihood, the gamma distribution family, and power functions as links is some of the most advanced material in the book. Most of the computational details are relegated to appendixes. A discussion of residuals returns one to a more practical perspective, and two long examples on gamma distribution applications provide excellent guidance on how to put this material into practice. One example is a contrast to the use of linear regression with a log transformation of the response, and the other is a comparison to the use of a different link function in the previous chapter. Chapter 6 considers generalized estimating equations (GEEs) for longitudinal and analogous studies. The rst half of the chapter presents the methodology, and the second half demonstrates its application through ve different examples. The basis for the general situation is rst established using the case with a normal distribution for the response and an identity link. The importance of the correlation structure is explained, the iterative estimation procedure is shown, and estimation for the scale parameters and the standard errors of the coef cients is discussed. The procedures are then generalized for the exponential family of distributions and quasi-likelihood estimation. Two of the examples are standard repeated-measures illustrations from biostatistical applications, but the last three illustrations are all interesting reworkings of industrial applications. The GEE computations in PROC GENMOD are applied to account for correlations that occur with multiple measurements on the subjects or restrictions to randomizations. The examples show that accounting for correlation structure can result in different conclusions. Chapter 7, “Further Advances and Applications in GLM,” discusses several additional topics. These are experimental designs for GLMs, asymptotic results, analysis of screening experiments, data transformation, modeling for both a process mean and variance, and generalized additive models. The material on experimental designs is more discursive than prescriptive and as a result is also somewhat theoretical. Similar comments apply for the discussion on the quality of the asymptotic results, which wallows a little too much in reports on various simulation studies. The examples on screening and data transformations experiments are again reworkings of analyses of familiar industrial examples and another obvious motivation for the enthusiasm that the authors have developed for using the GLM toolkit. One can hope that subsequent editions will similarly contain new examples that will have caused the authors to expand the material on generalized additive models and other topics in this chapter. Designating myself to review a book that I know I will love to read is one of the rewards of being editor. I read both of the editions of McCullagh and Nelder (1989), which was reviewed by Schuenemeyer (1992). That book was not fun to read. The obvious enthusiasm of Myers, Montgomery, and Vining and their reliance on their many examples as a major focus of their pedagogy make Generalized Linear Models a joy to read. Every statistician working in any area of applied science should buy it and experience the excitement of these new approaches to familiar activities.

...read moreread less

10,520 citations