scispace - formally typeset
Search or ask a question
Book

Applied Logistic Regression

TL;DR: Hosmer and Lemeshow as discussed by the authors provide an accessible introduction to the logistic regression model while incorporating advances of the last decade, including a variety of software packages for the analysis of data sets.
Abstract: From the reviews of the First Edition. "An interesting, useful, and well-written book on logistic regression models... Hosmer and Lemeshow have used very little mathematics, have presented difficult concepts heuristically and through illustrative examples, and have included references."- Choice "Well written, clearly organized, and comprehensive... the authors carefully walk the reader through the estimation of interpretation of coefficients from a wide variety of logistic regression models . . . their careful explication of the quantitative re-expression of coefficients from these various models is excellent." - Contemporary Sociology "An extremely well-written book that will certainly prove an invaluable acquisition to the practicing statistician who finds other literature on analysis of discrete data hard to follow or heavily theoretical."-The Statistician In this revised and updated edition of their popular book, David Hosmer and Stanley Lemeshow continue to provide an amazingly accessible introduction to the logistic regression model while incorporating advances of the last decade, including a variety of software packages for the analysis of data sets. Hosmer and Lemeshow extend the discussion from biostatistics and epidemiology to cutting-edge applications in data mining and machine learning, guiding readers step-by-step through the use of modeling techniques for dichotomous data in diverse fields. Ample new topics and expanded discussions of existing material are accompanied by a wealth of real-world examples-with extensive data sets available over the Internet.
Citations
More filters
Journal ArticleDOI
TL;DR: In this article, an easily interpretable index of predictive discrimination as well as methods for assessing calibration of predicted survival probabilities are discussed, which are particularly needed for binary, ordinal, and time-to-event outcomes.
Abstract: Multivariable regression models are powerful tools that are used frequently in studies of clinical outcomes. These models can use a mixture of categorical and continuous variables and can handle partially observed (censored) responses. However, uncritical application of modelling techniques can result in models that poorly fit the dataset at hand, or, even more likely, inaccurately predict outcomes on new subjects. One must know how to measure qualities of a model's fit in order to avoid poorly fitted or overfitted models. Measurement of predictive accuracy can be difficult for survival time data in the presence of censoring. We discuss an easily interpretable index of predictive discrimination as well as methods for assessing calibration of predicted survival probabilities. Both types of predictive accuracy should be unbiasedly validated using bootstrapping or cross-validation, before using predictions in a new data series. We discuss some of the hazards of poorly fitted and overfitted regression models and present one modelling strategy that avoids many of the problems discussed. The methods described are applicable to all regression models, but are particularly needed for binary, ordinal, and time-to-event outcomes. Methods are illustrated with a survival analysis in prostate cancer using Cox regression.

7,879 citations

Journal ArticleDOI
18 Jun 2003-JAMA
TL;DR: Notably, major depressive disorder is a common disorder, widely distributed in the population, and usually associated with substantial symptom severity and role impairment, and while the recent increase in treatment is encouraging, inadequate treatment is a serious concern.
Abstract: ContextUncertainties exist about prevalence and correlates of major depressive disorder (MDD).ObjectiveTo present nationally representative data on prevalence and correlates of MDD by Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV) criteria, and on study patterns and correlates of treatment and treatment adequacy from the recently completed National Comorbidity Survey Replication (NCS-R).DesignFace-to-face household survey conducted from February 2001 to December 2002.SettingThe 48 contiguous United States.ParticipantsHousehold residents ages 18 years or older (N = 9090) who responded to the NCS-R survey.Main Outcome MeasuresPrevalence and correlates of MDD using the World Health Organization's (WHO) Composite International Diagnostic Interview (CIDI), 12-month severity with the Quick Inventory of Depressive Symptomatology Self-Report (QIDS-SR), the Sheehan Disability Scale (SDS), and the WHO disability assessment scale (WHO-DAS). Clinical reinterviews used the Structured Clinical Interview for DSM-IV.ResultsThe prevalence of CIDI MDD for lifetime was 16.2% (95% confidence interval [CI], 15.1-17.3) (32.6-35.1 million US adults) and for 12-month was 6.6% (95% CI, 5.9-7.3) (13.1-14.2 million US adults). Virtually all CIDI 12-month cases were independently classified as clinically significant using the QIDS-SR, with 10.4% mild, 38.6% moderate, 38.0% severe, and 12.9% very severe. Mean episode duration was 16 weeks (95% CI, 15.1-17.3). Role impairment as measured by SDS was substantial as indicated by 59.3% of 12-month cases with severe or very severe role impairment. Most lifetime (72.1%) and 12-month (78.5%) cases had comorbid CIDI/DSM-IV disorders, with MDD only rarely primary. Although 51.6% (95% CI, 46.1-57.2) of 12-month cases received health care treatment for MDD, treatment was adequate in only 41.9% (95% CI, 35.9-47.9) of these cases, resulting in 21.7% (95% CI, 18.1-25.2) of 12-month MDD being adequately treated. Sociodemographic correlates of treatment were far less numerous than those of prevalence.ConclusionsMajor depressive disorder is a common disorder, widely distributed in the population, and usually associated with substantial symptom severity and role impairment. While the recent increase in treatment is encouraging, inadequate treatment is a serious concern. Emphasis on screening and expansion of treatment needs to be accompanied by a parallel emphasis on treatment quality improvement.

7,706 citations

Journal ArticleDOI
TL;DR: In this paper, instead of selecting factors by stepwise backward elimination, the authors focus on the accuracy of estimation and consider extensions of the lasso, the LARS algorithm and the non-negative garrotte for factor selection.
Abstract: Summary. We consider the problem of selecting grouped variables (factors) for accurate prediction in regression. Such a problem arises naturally in many practical situations with the multifactor analysis-of-variance problem as the most important and well-known example. Instead of selecting factors by stepwise backward elimination, we focus on the accuracy of estimation and consider extensions of the lasso, the LARS algorithm and the non-negative garrotte for factor selection. The lasso, the LARS algorithm and the non-negative garrotte are recently proposed regression methods that can be used to select individual variables. We study and propose efficient algorithms for the extensions of these methods for factor selection and show that these extensions give superior performance to the traditional stepwise backward elimination method in factor selection problems. We study the similarities and the differences between these methods. Simulations and real examples are used to illustrate the methods.

7,400 citations

Journal ArticleDOI
TL;DR: Thirteen recommendations are made to enable the objective selection of an error assessment technique for ecological presence/absence models and a new approach to estimating prediction error, which is based on the spatial characteristics of the errors, is proposed.
Abstract: Predicting the distribution of endangered species from habitat data is frequently perceived to be a useful technique. Models that predict the presence or absence of a species are normally judged by the number of prediction errors. These may be of two types: false positives and false negatives. Many of the prediction errors can be traced to ecological processes such as unsaturated habitat and species interactions. Consequently, if prediction errors are not placed in an ecological context the results of the model may be misleading. The simplest, and most widely used, measure of prediction accuracy is the number of correctly classified cases. There are other measures of prediction success that may be more appropriate. Strategies for assessing the causes and costs of these errors are discussed. A range of techniques for measuring error in presence/absence models, including some that are seldom used by ecologists (e.g. ROC plots and cost matrices), are described. A new approach to estimating prediction error, which is based on the spatial characteristics of the errors, is proposed. Thirteen recommendations are made to enable the objective selection of an error assessment technique for ecological presence/absence models.

6,044 citations

Journal ArticleDOI
22 Dec 1993-JAMA
TL;DR: The SAPS II, based on a large international sample of patients, provides an estimate of the risk of death without having to specify a primary diagnosis, and is a starting point for future evaluation of the efficiency of intensive care units.
Abstract: Objective. —To develop and validate a new Simplified Acute Physiology Score, the SAPS II, from a large sample of surgical and medical patients, and to provide a method to convert the score to a probability of hospital mortality. Design and Setting. —The SAPS II and the probability of hospital mortality were developed and validated using data from consecutive admissions to 137 adult medical and/or surgical intensive care units in 12 countries. Patients. —The 13 152 patients were randomly divided into developmental (65%) and validation (35%) samples. Patients younger than 18 years, burn patients, coronary care patients, and cardiac surgery patients were excluded. Outcome Measure. —Vital status at hospital discharge. Results. —The SAPS II includes only 17 variables: 12 physiology variables, age, type of admission (scheduled surgical, unscheduled surgical, or medical), and three underlying disease variables (acquired immunodeficiency syndrome, metastatic cancer, and hematologic malignancy). Goodness-of-fit tests indicated that the model performed well in the developmental sample and validated well in an independent sample of patients (P=.883 andP=.104 in the developmental and validation samples, respectively). The area under the receiver operating characteristic curve was 0.88 in the developmental sample and 0.86 in the validation sample. Conclusion. —The SAPS II, based on a large international sample of patients, provides an estimate of the risk of death without having to specify a primary diagnosis. This is a starting point for future evaluation of the efficiency of intensive care units. (JAMA. 1993;270:2957-2963)

5,836 citations