scispace - formally typeset
Search or ask a question
Journal Article•DOI•

Collapsibility of Multidimensional Contingency Tables

01 Jul 1978-Journal of the royal statistical society series b-methodological (John Wiley & Sons, Ltd)-Vol. 40, Iss: 3, pp 328-340
About: This article is published in Journal of the royal statistical society series b-methodological.The article was published on 1978-07-01. It has received 154 citations till now. The article focuses on the topics: Contingency table.
Citations
More filters
Journal Article•DOI•
TL;DR: Causal diagrams can provide a starting point for identifying variables that must be measured and controlled to obtain unconfounded effect estimates and provide a method for critical evaluation of traditional epidemiologic criteria for confounding.
Abstract: Causal diagrams have a long history of informal use and, more recently, have undergone formal development for applications in expert systems and robotics. We provide an introduction to these developments and their use in epidemiologic research. Causal diagrams can provide a starting point for identifying variables that must be measured and controlled to obtain unconfounded effect estimates. They also provide a method for critical evaluation of traditional epidemiologic criteria for confounding. In particular, they reveal certain heretofore unnoticed shortcomings of those criteria when used in considering multiple potential confounders. We show how to modify the traditional criteria to correct those shortcomings.

2,983 citations

Journal Article•DOI•
TL;DR: This paper showed that logistic regression estimates do not behave like linear regression estimates in one important respect: they are affected by omitted variables, even when these variables are unrelated to the independent variables in the model.
Abstract: Logistic regression estimates do not behave like linear regression estimates in one important respect: They are affected by omitted variables, even when these variables are unrelated to the independent variables in the model. This fact has important implications that have gone largely unnoticed by sociologists. Importantly, we cannot straightforwardly interpret log-odds ratios or odds ratios as effect measures, because they also reflect the degree of unobserved heterogeneity in the model. In addition, we cannot compare log-odds ratios or odds ratios for similar models across groups, samples, or time points, or across models with different independent variables in a sample. This article discusses these problems and possible ways of overcoming them.

2,416 citations


Cites methods from "Collapsibility of Multidimensional ..."

  • ...For the case of log-linear models and cross- tabular analysis the conflict between averages on the probability and on the OR scale has been discussed in terms of the collapsibility of OR over partial tables (Whittemore, 1978; Ducharme and Lepage, 1986)....

    [...]

Journal Article•DOI•
TL;DR: The authors compared the performance of several such strategies for fitting multiplicative Poisson regression models to cohort data, finding that the change-in-estimate and equivalence-test-of-the-difference strategies performed best when the cut-point for deciding whether crude and adjusted estimates differed by an important amount was set to a low value.
Abstract: In the absence of prior knowledge about population relations, investigators frequently employ a strategy that uses the data to help them decide whether to adjust for a variable. The authors compared the performance of several such strategies for fitting multiplicative Poisson regression models to cohort data: 1) the "change-in-estimate" strategy, in which a variable is controlled if the adjusted and unadjusted estimates differ by some important amount; 2) the "significance-test-of-the-covariate" strategy, in which a variable is controlled if its coefficient is significantly different from zero at some predetermined significance level; 3) the "significance-test-of-the-difference" strategy, which tests the difference between the adjusted and unadjusted exposure coefficients; 4) the "equivalence-test-of-the-difference" strategy, which significance-tests the equivalence of the adjusted and unadjusted exposure coefficients; and 5) a hybrid strategy that takes a weighted average of adjusted and unadjusted estimates. Data were generated from 8,100 population structures at each of several sample sizes. The performance of the different strategies was evaluated by computing bias, mean squared error, and coverage rates of confidence intervals. At least one variation of each strategy that was examined performed acceptably. The change-in-estimate and equivalence-test-of-the-difference strategies performed best when the cut-point for deciding whether crude and adjusted estimates differed by an important amount was set to a low value (10%). The significance test strategies performed best when the alpha level was set to much higher than conventional levels (0.20).

2,158 citations


Cites background from "Collapsibility of Multidimensional ..."

  • ...5) Significance-test the estimate difference (STD) (5, 6, 18): Use the adjusted estimate of effect if a collapsibility test of (/3adj ~ Peru) = 0 rejects at significance level a,...

    [...]

  • ...8 (6) 18(16) 18(43) 17(70) 15(84) a = 0....

    [...]

  • ...4) Significance test the change in estimate: Select a variable for control only if the change in the exposure-effect estimate produced by control of the variable is statistically significant (6)....

    [...]

  • ...8(52) 10(66) 8(86) 11 (95) 11 (98) 3 (0) 6 (1) 11 (6) 13(22) 11 (45)...

    [...]

Journal Article•DOI•
TL;DR: An overview of problems in multivariate modeling of epidemiologic data is provided, and some proposed solutions are examined, including model and variable forms should be selected based on regression diagnostic procedures, in addition to goodness-of-fit tests.
Abstract: This paper provides an overview of problems in multivariate modeling of epidemiologic data, and examines some proposed solutions. Special attention is given to the task of model selection, which involves selection of the model form, selection of the variables to enter the model, and selection of the form of these variables in the model. Several conclusions are drawn, among them: a) model and variable forms should be selected based on regression diagnostic procedures, in addition to goodness-of-fit tests; b) variable-selection algorithms in current packaged programs, such as conventional stepwise regression, can easily lead to invalid estimates and tests of effect; and c) variable selection is better approached by direct estimation of the degree of confounding produced by each variable than by significance-testing algorithms. As a general rule, before using a model to estimate effects, one should evaluate the assumptions implied by the model against both the data and prior information.

2,117 citations

Journal Article•DOI•
TL;DR: The authors propose two methods that account for the correlations but require only the summary estimates and marginal data from the studies, which provide more efficient estimates of regression slope, more accurate variance estimates, and more valid heterogeneity tests than those previously available.
Abstract: Meta-analysis often requires pooling of correlated estimates to compute regression slopes (trends) across different exposure or treatment levels The authors propose two methods that account for the correlations but require only the summary estimates and marginal data from the studies These methods provide more efficient estimates of regression slope, more accurate variance estimates, and more valid heterogeneity tests than those previously available One method also allows estimation of nonlinear trend components, such as quadratic effects The authors illustrate these methods in a meta-analysis of alcohol use and breast cancer

2,052 citations


Cites background from "Collapsibility of Multidimensional ..."

  • ..., the sampling distribution is strictly collapsible (3); 2) the correlation matrices of the crude and adjusted odds ratios are approximately equal; 3) the variances of the crude odds ratios can be approximated by the usual formulas based on the multinomial or Poisson distributions....

    [...]

  • ...Sunlight exposure and basal ceo skin cancer (14) Full data ((3) Corrected (b*) Uncorrected (b)...

    [...]

References
More filters
Book•
01 Jan 1975
TL;DR: Discrete Multivariate Analysis is a comprehensive text and general reference on the analysis of discrete multivariate data, particularly in the form of multidimensional tables, and contains a wealth of material on important topics.
Abstract: "At last, after a decade of mounting interest in log-linear and related models for the analysis of discrete multivariate data, particularly in the form of multidimensional tables, we now have a comprehensive text and general reference on the subject. Even a mediocre attempt to organize the extensive and widely scattered literature on discrete multivariate analysis would be welcome; happily, this is an excellent such effort, but a group of Harvard statisticians taht has contributed much to the field. Their book ought to serve as a basic guide to the analysis of quantitative data for years to come." --James R. Beninger, Contemporary Sociology "A welcome addition to multivariate analysis. The discussion is lucid and very leisurely, excellently illustrated with applications drawn from a wide variety of fields. A good part of the book can be understood without very specialized statistical knowledge. It is a most welcome contribution to an interesting and lively subject." --D.R. Cox, Nature "Discrete Multivariate Analysis is an ambitious attempt to present log-linear models to a broad audience. Exposition is quite discursive, and the mathematical level, except in Chapters 12 and 14, is very elementary. To illustrate possible applications, some 60 different sets of data have been gathered together from diverse fields. To aid the reader, an index of these examples has been provided. ...the book contains a wealth of material on important topics. Its numerous examples are especially valuable." --Shelby J. Haberman, The Annals of Statistics

5,309 citations

Book•
01 Jan 1970
TL;DR: Binary response variables special logistical analyses some complications some related approaches more complex responses.
Abstract: The first edition of this book (1970) set out a systematic basis for the analysis of binary data and in particular for the study of how the probability of 'success' depends on explanatory variables. The first edition has been widely used and the general level and style have been preserved in the second edition, which contains a substantial amount of new material. This amplifies matters dealt with only cryptically in the first edition and includes many more recent developments. In addition the whole material has been reorganized, in particular to put more emphasis on m.aximum likelihood methods.There are nearly 60 further results and exercises. The main points are illustrated by practical examples, many of them not in the first edition, and some general essential background material is set out in new Appendices.

2,855 citations

Journal Article•DOI•
TL;DR: The special cases of linear functions and logarithmic functions of the 7rin are developed in detail, and some examples of how the general approach can be used to analyze various types of categorical data are presented.
Abstract: Assume there are ni., i = 1, 2, *--, s, samples from s multinomial distributions each having r categories of response. Then define any u functions of the unknown true cell probabilities {7rij: i = 1, 2, * , s; j = 1, 2, * , r, where E jrij l 1 } that have derivatives up to the second order with respect to 7rij, and for which the matrix of first derivatives is of rank u. A general noniterative procedure is described for fitting these functions to a linear model, for testing the goodness-of-fit of the model, and for testing hypotheses about the parameters in the linear model. The special cases of linear functions and logarithmic functions of the 7rin are developed in detail, and some examples of how the general approach can be used to analyze various types of categorical data are presented.

1,515 citations

Journal Article•DOI•

789 citations