scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A simulation study of the number of events per variable in logistic regression analysis.

TL;DR: Findings indicate that low EPV can lead to major problems, and the regression coefficients were biased in both positive and negative directions, and paradoxical associations (significance in the wrong direction) were increased.
About: This article is published in Journal of Clinical Epidemiology.The article was published on 1996-12-01. It has received 6490 citations till now. The article focuses on the topics: Sample variance & Logistic regression.
Citations
More filters
Journal ArticleDOI
TL;DR: The propensity score is a balancing score: conditional on the propensity score, the distribution of observed baseline covariates will be similar between treated and untreated subjects, and different causal average treatment effects and their relationship with propensity score analyses are described.
Abstract: The propensity score is the probability of treatment assignment conditional on observed baseline characteristics. The propensity score allows one to design and analyze an observational (nonrandomized) study so that it mimics some of the particular characteristics of a randomized controlled trial. In particular, the propensity score is a balancing score: conditional on the propensity score, the distribution of observed baseline covariates will be similar between treated and untreated subjects. I describe 4 different propensity score methods: matching on the propensity score, stratification on the propensity score, inverse probability of treatment weighting using the propensity score, and covariate adjustment using the propensity score. I describe balance diagnostics for examining whether the propensity score model has been adequately specified. Furthermore, I discuss differences between regression-based methods and propensity score-based methods for the analysis of observational data. I describe different causal average treatment effects and their relationship with propensity score analyses.

7,895 citations


Additional excerpts

  • ...When outcomes are either binary or time-to-event in nature, prior research has suggested that at least 10 events should be observed for every covariate that is entered into a regression model ( Peduzzi, Concato, Feinstein, & Holford, 1995 ; Peduzzi, Concato, Kemper, Holford, & Feinstein, 1996 )....

    [...]

BookDOI
01 Jan 2006
TL;DR: Regression models are frequently used to develop diagnostic, prognostic, and health resource utilization models in clinical, health services, outcomes, pharmacoeconomic, and epidemiologic research, and in a multitude of non-health-related areas.
Abstract: Regression models are frequently used to develop diagnostic, prognostic, and health resource utilization models in clinical, health services, outcomes, pharmacoeconomic, and epidemiologic research, and in a multitude of non-health-related areas. Regression models are also used to adjust for patient heterogeneity in randomized clinical trials, to obtain tests that are more powerful and valid than unadjusted treatment comparisons.

4,211 citations

Journal ArticleDOI
TL;DR: In virtually all medical domains, diagnostic and prognostic multivariable prediction models are being developed, validated, updated, and implemented with the aim to assist doctors and individuals in estimating probabilities and potentially influence their decision making.
Abstract: The TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) Statement includes a 22-item checklist, which aims to improve the reporting of studies developing, validating, or updating a prediction model, whether for diagnostic or prognostic purposes. The TRIPOD Statement aims to improve the transparency of the reporting of a prediction model study regardless of the study methods used. This explanation and elaboration document describes the rationale; clarifies the meaning of each item; and discusses why transparent reporting is important, with a view to assessing risk of bias and clinical usefulness of the prediction model. Each checklist item of the TRIPOD Statement is explained in detail and accompanied by published examples of good reporting. The document also provides a valuable reference of issues to consider when designing, conducting, and analyzing prediction model studies. To aid the editorial process and help peer reviewers and, ultimately, readers and systematic reviewers of prediction model studies, it is recommended that authors include a completed checklist in their submission. The TRIPOD checklist can also be downloaded from www.tripod-statement.org.

2,982 citations

Journal ArticleDOI
TL;DR: A large simulation study of other influences on confidence interval coverage, type I error, relative bias, and other model performance measures found a range of circumstances in which coverage and bias were within acceptable levels despite less than 10 EPV.
Abstract: The rule of thumb that logistic and Cox models should be used with a minimum of 10 outcome events per predictor variable (EPV), based on two simulation studies, may be too conservative. The authors conducted a large simulation study of other influences on confidence interval coverage, type I error, relative bias, and other model performance measures. They found a range of circumstances in which coverage and bias were within acceptable levels despite less than 10 EPV, as well as other factors that were as influential as or more influential than EPV. They conclude that this rule can be relaxed, in particular for sensitivity analyses undertaken to demonstrate adequate control of confounding.

2,943 citations


Cites background from "A simulation study of the number of..."

  • ...(2, 3) more closely, we also examined models with all binary predictors....

    [...]

Journal ArticleDOI
TL;DR: The examples considered in this paper show the tension between the scientific rationale for using meta-regression and the difficult interpretative problems to which such analyses are prone.
Abstract: SUMMARY Appropriate methods for meta-regression applied to a set of clinical trials, and the limitations and pitfalls in interpretation, are insuciently recognized. Here we summarize recent research focusing on these issues, and consider three published examples of meta-regression in the light of this work. One principal methodological issue is that meta-regression should be weighted to take account of both within-trial variances of treatment eects and the residual between-trial heterogeneity (that is, heterogeneity not explained by the covariates in the regression). This corresponds to random eects meta-regression. The associations derived from meta-regressions are observational, and have a weaker interpretation than the causal relationships derived from randomized comparisons. This applies particularly when averages of patient characteristics in each trial are used as covariates in the regression. Data dredging is the main pitfall in reaching reliable conclusions from meta-regression. It can only be avoided by prespecication of covariates that will be investigated as potential sources of heterogeneity. However, in practice this is not always easy to achieve. The examples considered in this paper show the tension between the scientic rationale for using meta-regression and the dicult interpretative problems to which such analyses are prone. Copyright ? 2002 John Wiley & Sons, Ltd.

2,486 citations


Cites methods from "A simulation study of the number of..."

  • ...Statistical issues for future research into meta-regression methods include how the number of covariates that can reliably be included depends on the number of trials (and their imprecisions) [38], the handling of multi-arm trials when individual patient data are not available, the appropriate use of regression diagnostics and sensitivity analyses [39; 40], and whether there are biases in using derived statistics measured after baseline as covariates....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: In this article, categorical data analysis was used for categorical classification of categorical categorical datasets.Categorical Data Analysis, categorical Data analysis, CDA, CPDA, CDSA
Abstract: categorical data analysis , categorical data analysis , کتابخانه مرکزی دانشگاه علوم پزشکی تهران

10,964 citations

Journal ArticleDOI
TL;DR: In this paper, a practical guide to goodness-of-fit tests using statistics based on the empirical distribution function (EDF) is presented, and five of the leading statistics are examined.
Abstract: This article offers a practical guide to goodness-of-fit tests using statistics based on the empirical distribution function (EDF). Five of the leading statistics are examined—those often labelled D, W 2, V, U 2, A 2—and three important situations: where the hypothesized distribution F(x) is completely specified and where F(x) represents the normal or exponential distribution with one or more parameters to be estimated from the data. EDF statistics are easily calculated, and the tests require only one line of significance points for each situation. They are also shown to be competitive in terms of power.

2,890 citations

Journal ArticleDOI
TL;DR: The analytical effect of the number of events per variable (EPV) in a proportional hazards regression analysis was evaluated using Monte Carlo simulation techniques for data from a randomized trial containing 673 patients and 252 deaths, in which seven predictor variables had an original significance level of p < 0.10.

1,650 citations

Journal ArticleDOI
TL;DR: The purpose in the current research was to note the frequency with which multivariable analyses now appear in general medical journals, to identify some common problems and desirable precautions in the analyses, and to determine how well the challenges are being met.
Abstract: Purpose: To review the principles of multivariable analysis and to examine the application of multivariable statistical methods in general medical literature. Data Sources: A computer-assisted sear...

1,087 citations

Journal ArticleDOI
TL;DR: The research is presented in two parts: Part I describes the data set and strategy used for the analyses, including the Monte Carlo simulation studies done to determine and compare the impact of various values of EPV in proportional hazards analytical results.

643 citations