scispace - formally typeset
Search or ask a question
Proceedings Article

EfficientL 1 regularized logistic regression

16 Jul 2006-pp 401-408
TL;DR: Theoretical results show that the proposed efficient algorithm for L1 regularized logistic regression is guaranteed to converge to the global optimum, and experiments show that it significantly outperforms standard algorithms for solving convex optimization problems.
Abstract: L1 regularized logistic regression is now a workhorse of machine learning: it is widely used for many classification problems, particularly ones with many features. L1 regularized logistic regression requires solving a convex optimization problem. However, standard algorithms for solving convex optimization problems do not scale well enough to handle the large datasets encountered in many practical settings. In this paper, we propose an efficient algorithm for L1 regularized logistic regression. Our algorithm iteratively approximates the objective function by a quadratic approximation at the current point, while maintaining the L1 constraint. In each iteration, it uses the efficient LARS (Least Angle Regression) algorithm to solve the resulting L1 constrained quadratic optimization problem. Our theoretical results show that our algorithm is guaranteed to converge to the global optimum. Our experiments show that our algorithm significantly outperforms standard algorithms for solving convex optimization problems. Moreover, our algorithm outperforms four previously published algorithms that were specifically designed to solve the L1 regularized logistic regression problem.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI
28 May 2018
TL;DR: This paper uses the MSR 2018 Challenge Data of over 3000 developer session and over 10 million recorded events to analyze and cleanse the data to be parsed into event series, which can be used to train a variety of machine learning models, including a neural network, to predict user induced commands.
Abstract: When a developer is writing code they are usually focused and in a state-of-mind which some refer to as flow. Breaking out of this flow can cause the developer to lose their train of thought and have to start their thought process from the beginning. This loss of thought can be caused by interruptions and sometimes slow IDE interactions. Predictive functionality has been harnessed in user applications to speed up load times, such as in Google Chrome's browser which has a feature called "Predicting Network Actions". This will pre-load web-pages that the user is most likely to click through. This mitigates the interruption that load times can introduce. In this paper we seek to make the first step towards predicting user commands in the IDE. Using the MSR 2018 Challenge Data of over 3000 developer session and over 10 million recorded events, we analyze and cleanse the data to be parsed into event series, which can then be used to train a variety of machine learning models, including a neural network, to predict user induced commands. Our highest performing model is able to obtain a 5 cross-fold validation prediction accuracy of 64%.

3 citations


Cites methods from "EfficientL 1 regularized logistic r..."

  • ...To widen our variety of models, we also tried Scikit-learn’s Logistic Regression model [9]....

    [...]

DissertationDOI
28 May 2013
TL;DR: Using data generated from a simulated TiN detection paradigm, the method is shown to precisely identify observer cues from a large set of covarying, interdependent stimulus descriptors—a setting where standard correlation and regression methods fail.
Abstract: The central aim of psychophysics is to understand the functional relationship between the physical and the psychological world. Striving for that goal, modern research focuses on quantitatively measuring and explaining observer behavior in specific psychophysical paradigms. In this context, a principal question arises: Which particular stimulus features govern individual decisions in a behavioral task? As regards this problem, the classical psychophysical paradigm of narrow-band Tone-in-Noise (TiN) detection has been under investigation for more than 70 years. This particular experiment stands at the heart of a central notion in auditory perception: the “critical band”. Yet no conclusive answer has been given as to which auditory features listeners employ in this task. The present study describes how a modern statistical analysis procedure can be used to tackle this problem when modeling psychophysical data. The proposed technique combines the concept of relative linear combination weights with an L1-regularized logistic regression—a procedure developed in machine learning. This method enforces “sparse” solutions, a computational approximation to the postulate that a good model should contain the minimal set of predictors necessary to explain the data. This property is essential when extracting the critical perceptual features from observer models after they were fit to behavioral data. Using data generated from a simulated TiN detection paradigm, the method is shown to precisely identify observer cues from a large set of covarying, interdependent stimulus descriptors—a setting where standard correlation and regression methods fail. Furthermore, the detailed decision rules of the simulated observers were reconstructed, allowing predictions of responses on the basis of individual stimuli. The practical part of this study aimed at using the sparse analysis procedure to investigate the perceptual mechanisms underlying the detection performance of human observers in a TiN detection paradigm. Therefore, a large trial-by-trial data set was collected with multiple listeners. Relative perceptual weights were then estimated for a diverse set of auditory features encompassing sound energy, fine structure and envelope. By expanding the common linear observer model to allow for behavioral predictors, sequential dependencies in observer responses were also taken into account. These dependencies generally impair detection performance and even arise when study participants are made aware of the purely random stimulus sequence. The fitted models captured the behavior of all listeners on a single-trial level. The estimated perceptual weights were stable across signal levels. They suggest that all observers depend on stimulus energy, and “critical band”-like detectors in the fine structure domain while a subset of the listeners exhibited an additional dependence on stimulus envelope. In addition to stimulus characteristics, earlier responses appeared to substantially influence the current decision of some observers. In conclusion, by approaching a classical problem in auditory psychophysics with an advanced statistical analysis procedure, an already large pool of empirical knowledge was expanded in several important aspects. In that process, the power and efficiency of the proposedmethodwas demonstrated. Based on very general concepts, it is flexible enough to be applicable in a wide variety of studies that investigate perceptual mechanisms.

3 citations


Cites methods from "EfficientL 1 regularized logistic r..."

  • ...Only later, I decided to employ a sparse regularized logistic regression (for which efficient implementations had just been developed (Lee et al., 2006; Park and Hastie, 2007))....

    [...]

Journal ArticleDOI
TL;DR: A statistical learning approach is developed to extract domain-specific QoS features from user-provided service reviews, which aim to classify user reviews based on their sentiment orientations into either a positive or negative category.
Abstract: With the fast increase of online services of all kinds, users start to care more about the Quality of Service (QoS) that a service provider can offer besides the functionalities of the services. As a result, QoS-based service selection and recommendation have received significant attention since the mid-2000s. However, existing approaches primarily consider a small number of standard QoS parameters, most of which relate to the response time, fee, availability of services, and so on. As online services start to diversify significantly over different domains, these small set of QoS parameters will not be able to capture the different quality aspects that users truly care about over different domains. Most existing approaches for QoS data collection depend on the information from service providers, which are sensitive to the trustworthiness of the providers. Some service monitoring mechanisms collect QoS data through actual service invocations but may be affected by actual hardware/software configurations. In either case, domain-specific QoS data that capture what users truly care about have not been successfully collected or analyzed by existing works in service computing. To address this demanding issue, we develop a statistical learning approach to extract domain-specific QoS features from user-provided service reviews. In particular, we aim to classify user reviews based on their sentiment orientations into either a positive or negative category. Meanwhile, statistical feature selection is performed to identify statistically nontrivial terms from review text, which can serve as candidate QoS features. We also develop a topic models-based approach that automatically groups relevant terms and returns the term groups to users, where each term group corresponds to one high-level quality aspect of services. We have conducted extensive experiments on three real-world datasets to demonstrates the effectiveness of our approach.

3 citations

Journal ArticleDOI
TL;DR: A synchronous parallel block coordinate descent algorithm for minimizing a composite function, which consists of a smooth convex function plus a non-smooth but separable convexfunction, and a randomized variant, which randomly update some blocks of coordinates at each round of computation.
Abstract: This paper proposes a synchronous parallel block coordinate descent algorithm for minimizing a composite function, which consists of a smooth convex function plus a non-smooth but separable convex function. Due to the generalization of the proposed method, some existing synchronous parallel algorithms can be considered as special cases. To tackle high dimensional problems, the authors further develop a randomized variant, which randomly update some blocks of coordinates at each round of computation. Both proposed parallel algorithms are proven to have sub-linear convergence rate under rather mild assumptions. The numerical experiments on solving the large scale regularized logistic regression with l1 norm penalty show that the implementation is quite efficient. The authors conclude with explanation on the observed experimental results and discussion on the potential improvements.

3 citations

Journal ArticleDOI
TL;DR: In this paper , the authors determined the most informative pre-and in-cycle variables for predicting success for a first autologous oocyte in-vitro fertilization (IVF) cycle.
Abstract: The aim of this study is to determine the most informative pre- and in-cycle variables for predicting success for a first autologous oocyte in-vitro fertilization (IVF) cycle. This is a retrospective study using 22,413 first autologous oocyte IVF cycles from 2001 to 2018. Models were developed to predict pregnancy following an IVF cycle with a fresh embryo transfer. The importance of each variable was determined by its coefficient in a logistic regression model and the prediction accuracy based on different variable sets was reported. The area under the receiver operating characteristic curve (AUC) on a validation patient cohort was the metric for prediction accuracy. Three factors were found to be of importance when predicting IVF success: age in three groups (38-40, 41-42, and above 42 years old), number of transferred embryos, and number of cryopreserved embryos. For predicting first-cycle IVF pregnancy using all available variables, the predictive model achieved an AUC of 68% + /- 0.01%. A parsimonious predictive model utilizing age (38-40, 41-42, and above 42 years old), number of transferred embryos, and number of cryopreserved embryos achieved an AUC of 65% + /- 0.01%. The proposed models accurately predict a single IVF cycle pregnancy outcome and identify important predictive variables associated with the outcome. These models are limited to predicting pregnancy immediately after the IVF cycle and not live birth. These models do not include indicators of multiple gestation and are not intended for clinical application.

3 citations

References
More filters
Journal ArticleDOI
TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Abstract: SUMMARY We propose a new method for estimation in linear models. The 'lasso' minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree-based models are briefly described.

40,785 citations


"EfficientL 1 regularized logistic r..." refers methods in this paper

  • ...(Tibshirani 1996) Several algorithms have been developed to solve L1 constrained least squares problems....

    [...]

  • ...See, Tibshirani (1996) for details.)...

    [...]

  • ...(Tibshirani 1996) Several algorithms have been developed to solve L1 constrained least squares problems....

    [...]

Book
01 Mar 2004
TL;DR: In this article, the focus is on recognizing convex optimization problems and then finding the most appropriate technique for solving them, and a comprehensive introduction to the subject is given. But the focus of this book is not on the optimization problem itself, but on the problem of finding the appropriate technique to solve it.
Abstract: Convex optimization problems arise frequently in many different fields. A comprehensive introduction to the subject, this book shows in detail how such problems can be solved numerically with great efficiency. The focus is on recognizing convex optimization problems and then finding the most appropriate technique for solving them. The text contains many worked examples and homework exercises and will appeal to students, researchers and practitioners in fields such as engineering, computer science, mathematics, statistics, finance, and economics.

33,341 citations

Book
01 Jan 1983
TL;DR: In this paper, a generalization of the analysis of variance is given for these models using log- likelihoods, illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables), and gamma (variance components).
Abstract: The technique of iterative weighted linear regression can be used to obtain maximum likelihood estimates of the parameters with observations distributed according to some exponential family and systematic effects that can be made linear by a suitable transformation. A generalization of the analysis of variance is given for these models using log- likelihoods. These generalized linear models are illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables) and gamma (variance components).

23,215 citations

01 Jan 1998

12,940 citations


"EfficientL 1 regularized logistic r..." refers methods in this paper

  • ...We tested each algorithm’s performance on 12 different datasets, consisting of 9 UCI datasets (Newman et al. 1998), one artificial dataset called Madelon from the NIPS 2003 workshop on feature extraction,3 and two gene expression datasets (Microarray 1 and 2).4 Table 2 gives details on the number…...

    [...]

  • ...We tested each algorithm’s performance on 12 different real datasets, consisting of 9 UCI datasets (Newman et al. 1998) and 3 gene expression datasets (Microarray 1, 2 and 3) 3....

    [...]

Journal ArticleDOI
TL;DR: This is the Ž rst book on generalized linear models written by authors not mostly associated with the biological sciences, and it is thoroughly enjoyable to read.
Abstract: This is the Ž rst book on generalized linear models written by authors not mostly associated with the biological sciences. Subtitled “With Applications in Engineering and the Sciences,” this book’s authors all specialize primarily in engineering statistics. The Ž rst author has produced several recent editions of Walpole, Myers, and Myers (1998), the last reported by Ziegel (1999). The second author has had several editions of Montgomery and Runger (1999), recently reported by Ziegel (2002). All of the authors are renowned experts in modeling. The Ž rst two authors collaborated on a seminal volume in applied modeling (Myers and Montgomery 2002), which had its recent revised edition reported by Ziegel (2002). The last two authors collaborated on the most recent edition of a book on regression analysis (Montgomery, Peck, and Vining (2001), reported by Gray (2002), and the Ž rst author has had multiple editions of his own regression analysis book (Myers 1990), the latest of which was reported by Ziegel (1991). A comparable book with similar objectives and a more speciŽ c focus on logistic regression, Hosmer and Lemeshow (2000), reported by Conklin (2002), presumed a background in regression analysis and began with generalized linear models. The Preface here (p. xi) indicates an identical requirement but nonetheless begins with 100 pages of material on linear and nonlinear regression. Most of this will probably be a review for the readers of the book. Chapter 2, “Linear Regression Model,” begins with 50 pages of familiar material on estimation, inference, and diagnostic checking for multiple regression. The approach is very traditional, including the use of formal hypothesis tests. In industrial settings, use of p values as part of a risk-weighted decision is generally more appropriate. The pedagologic approach includes formulas and demonstrations for computations, although computing by Minitab is eventually illustrated. Less-familiar material on maximum likelihood estimation, scaled residuals, and weighted least squares provides more speciŽ c background for subsequent estimation methods for generalized linear models. This review is not meant to be disparaging. The authors have packed a wealth of useful nuggets for any practitioner in this chapter. It is thoroughly enjoyable to read. Chapter 3, “Nonlinear Regression Models,” is arguably less of a review, because regression analysis courses often give short shrift to nonlinear models. The chapter begins with a great example on the pitfalls of linearizing a nonlinear model for parameter estimation. It continues with the effective balancing of explicit statements concerning the theoretical basis for computations versus the application and demonstration of their use. The details of maximum likelihood estimation are again provided, and weighted and generalized regression estimation are discussed. Chapter 4 is titled “Logistic and Poisson Regression Models.” Logistic regression provides the basic model for generalized linear models. The prior development for weighted regression is used to motivate maximum likelihood estimation for the parameters in the logistic model. The algebraic details are provided. As in the development for linear models, some of the details are pushed into an appendix. In addition to connecting to the foregoing material on regression on several occasions, the authors link their development forward to their following chapter on the entire family of generalized linear models. They discuss score functions, the variance-covariance matrix, Wald inference, likelihood inference, deviance, and overdispersion. Careful explanations are given for the values provided in standard computer software, here PROC LOGISTIC in SAS. The value in having the book begin with familiar regression concepts is clearly realized when the analogies are drawn between overdispersion and nonhomogenous variance, or analysis of deviance and analysis of variance. The authors rely on the similarity of Poisson regression methods to logistic regression methods and mostly present illustrations for Poisson regression. These use PROC GENMOD in SAS. The book does not give any of the SAS code that produces the results. Two of the examples illustrate designed experiments and modeling. They include discussion of subset selection and adjustment for overdispersion. The mathematic level of the presentation is elevated in Chapter 5, “The Family of Generalized Linear Models.” First, the authors unify the two preceding chapters under the exponential distribution. The material on the formal structure for generalized linear models (GLMs), likelihood equations, quasilikelihood, the gamma distribution family, and power functions as links is some of the most advanced material in the book. Most of the computational details are relegated to appendixes. A discussion of residuals returns one to a more practical perspective, and two long examples on gamma distribution applications provide excellent guidance on how to put this material into practice. One example is a contrast to the use of linear regression with a log transformation of the response, and the other is a comparison to the use of a different link function in the previous chapter. Chapter 6 considers generalized estimating equations (GEEs) for longitudinal and analogous studies. The Ž rst half of the chapter presents the methodology, and the second half demonstrates its application through Ž ve different examples. The basis for the general situation is Ž rst established using the case with a normal distribution for the response and an identity link. The importance of the correlation structure is explained, the iterative estimation procedure is shown, and estimation for the scale parameters and the standard errors of the coefŽ cients is discussed. The procedures are then generalized for the exponential family of distributions and quasi-likelihood estimation. Two of the examples are standard repeated-measures illustrations from biostatistical applications, but the last three illustrations are all interesting reworkings of industrial applications. The GEE computations in PROC GENMOD are applied to account for correlations that occur with multiple measurements on the subjects or restrictions to randomizations. The examples show that accounting for correlation structure can result in different conclusions. Chapter 7, “Further Advances and Applications in GLM,” discusses several additional topics. These are experimental designs for GLMs, asymptotic results, analysis of screening experiments, data transformation, modeling for both a process mean and variance, and generalized additive models. The material on experimental designs is more discursive than prescriptive and as a result is also somewhat theoretical. Similar comments apply for the discussion on the quality of the asymptotic results, which wallows a little too much in reports on various simulation studies. The examples on screening and data transformations experiments are again reworkings of analyses of familiar industrial examples and another obvious motivation for the enthusiasm that the authors have developed for using the GLM toolkit. One can hope that subsequent editions will similarly contain new examples that will have caused the authors to expand the material on generalized additive models and other topics in this chapter. Designating myself to review a book that I know I will love to read is one of the rewards of being editor. I read both of the editions of McCullagh and Nelder (1989), which was reviewed by Schuenemeyer (1992). That book was not fun to read. The obvious enthusiasm of Myers, Montgomery, and Vining and their reliance on their many examples as a major focus of their pedagogy make Generalized Linear Models a joy to read. Every statistician working in any area of applied science should buy it and experience the excitement of these new approaches to familiar activities.

10,520 citations


Additional excerpts

  • ...(Nelder & Wedderbum 1972; McCullagh & Nelder 1989)...

    [...]

  • ...(Nelder & Wedderbum 1972; McCullagh & Nelder 1989 )...

    [...]