scispace - formally typeset
Search or ask a question
MonographDOI

Categorical data analysis

01 May 1993-Contemporary Sociology (Wiley Interscience)-Vol. 22, Iss: 1, pp 459
TL;DR: In this article, the authors present a generalized linear model for categorical data, which is based on the Logit model, and use it to fit Logistic Regression models.
Abstract: Preface. 1. Introduction: Distributions and Inference for Categorical Data. 1.1 Categorical Response Data. 1.2 Distributions for Categorical Data. 1.3 Statistical Inference for Categorical Data. 1.4 Statistical Inference for Binomial Parameters. 1.5 Statistical Inference for Multinomial Parameters. Notes. Problems. 2. Describing Contingency Tables. 2.1 Probability Structure for Contingency Tables. 2.2 Comparing Two Proportions. 2.3 Partial Association in Stratified 2 x 2 Tables. 2.4 Extensions for I x J Tables. Notes. Problems. 3. Inference for Contingency Tables. 3.1 Confidence Intervals for Association Parameters. 3.2 Testing Independence in Two Way Contingency Tables. 3.3 Following Up Chi Squared Tests. 3.4 Two Way Tables with Ordered Classifications. 3.5 Small Sample Tests of Independence. 3.6 Small Sample Confidence Intervals for 2 x 2 Tables . 3.7 Extensions for Multiway Tables and Nontabulated Responses. Notes. Problems. 4. Introduction to Generalized Linear Models. 4.1 Generalized Linear Model. 4.2 Generalized Linear Models for Binary Data. 4.3 Generalized Linear Models for Counts. 4.4 Moments and Likelihood for Generalized Linear Models . 4.5 Inference for Generalized Linear Models. 4.6 Fitting Generalized Linear Models. 4.7 Quasi likelihood and Generalized Linear Models . 4.8 Generalized Additive Models . Notes. Problems. 5. Logistic Regression. 5.1 Interpreting Parameters in Logistic Regression. 5.2 Inference for Logistic Regression. 5.3 Logit Models with Categorical Predictors. 5.4 Multiple Logistic Regression. 5.5 Fitting Logistic Regression Models. Notes. Problems. 6. Building and Applying Logistic Regression Models. 6.1 Strategies in Model Selection. 6.2 Logistic Regression Diagnostics. 6.3 Inference About Conditional Associations in 2 x 2 x K Tables. 6.4 Using Models to Improve Inferential Power. 6.5 Sample Size and Power Considerations . 6.6 Probit and Complementary Log Log Models . 6.7 Conditional Logistic Regression and Exact Distributions . Notes. Problems. 7. Logit Models for Multinomial Responses. 7.1 Nominal Responses: Baseline Category Logit Models. 7.2 Ordinal Responses: Cumulative Logit Models. 7.3 Ordinal Responses: Cumulative Link Models. 7.4 Alternative Models for Ordinal Responses . 7.5 Testing Conditional Independence in I x J x K Tables . 7.6 Discrete Choice Multinomial Logit Models . Notes. Problems. 8. Loglinear Models for Contingency Tables. 8.1 Loglinear Models for Two Way Tables. 8.2 Loglinear Models for Independence and Interaction in Three Way Tables. 8.3 Inference for Loglinear Models. 8.4 Loglinear Models for Higher Dimensions. 8.5 The Loglinear Logit Model Connection. 8.6 Loglinear Model Fitting: Likelihood Equations and Asymptotic Distributions . 8.7 Loglinear Model Fitting: Iterative Methods and their Application . Notes. Problems. 9. Building and Extending Loglinear/Logit Models. 9.1 Association Graphs and Collapsibility. 9.2 Model Selection and Comparison. 9.3 Diagnostics for Checking Models. 9.4 Modeling Ordinal Associations. 9.5 Association Models . 9.6 Association Models, Correlation Models, and Correspondence Analysis . 9.7 Poisson Regression for Rates. 9.8 Empty Cells and Sparseness in Modeling Contingency Tables. Notes. Problems. 10. Models for Matched Pairs. 10.1 Comparing Dependent Proportions. 10.2 Conditional Logistic Regression for Binary Matched Pairs. 10.3 Marginal Models for Square Contingency Tables. 10.4 Symmetry, Quasi symmetry, and Quasiindependence. 10.5 Measuring Agreement Between Observers. 10.6 Bradley Terry Model for Paired Preferences. 10.7 Marginal Models and Quasi symmetry Models for Matched Sets . Notes. Problems. 11. Analyzing Repeated Categorical Response Data. 11.1 Comparing Marginal Distributions: Multiple Responses. 11.2 Marginal Modeling: Maximum Likelihood Approach. 11.3 Marginal Modeling: Generalized Estimating Equations Approach. 11.4 Quasi likelihood and Its GEE Multivariate Extension: Details . 11.5 Markov Chains: Transitional Modeling. Notes. Problems. 12. Random Effects: Generalized Linear Mixed Models for Categorical Responses. 12.1 Random Effects Modeling of Clustered Categorical Data. 12.2 Binary Responses: Logistic Normal Model. 12.3 Examples of Random Effects Models for Binary Data. 12.4 Random Effects Models for Multinomial Data. 12.5 Multivariate Random Effects Models for Binary Data. 12.6 GLMM Fitting, Inference, and Prediction. Notes. Problems. 13. Other Mixture Models for Categorical Data . 13.1 Latent Class Models. 13.2 Nonparametric Random Effects Models. 13.3 Beta Binomial Models. 13.4 Negative Binomial Regression. 13.5 Poisson Regression with Random Effects. Notes. Problems. 14. Asymptotic Theory for Parametric Models. 14.1 Delta Method. 14.2 Asymptotic Distributions of Estimators of Model Parameters and Cell Probabilities. 14.3 Asymptotic Distributions of Residuals and Goodnessof Fit Statistics. 14.4 Asymptotic Distributions for Logit/Loglinear Models. Notes. Problems. 15. Alternative Estimation Theory for Parametric Models. 15.1 Weighted Least Squares for Categorical Data. 15.2 Bayesian Inference for Categorical Data. 15.3 Other Methods of Estimation. Notes. Problems. 16. Historical Tour of Categorical Data Analysis . 16.1 Pearson Yule Association Controversy. 16.2 R. A. Fisher s Contributions. 16.3 Logistic Regression. 16.4 Multiway Contingency Tables and Loglinear Models. 16.5 Recent and Future? Developments. Appendix A. Using Computer Software to Analyze Categorical Data. A.1 Software for Categorical Data Analysis. A.2 Examples of SAS Code by Chapter. Appendix B. Chi Squared Distribution Values. References. Examples Index. Author Index. Subject Index. Sections marked with an asterisk are less important for an overview.
Citations
More filters
Journal ArticleDOI
TL;DR: A method based on the negative binomial distribution, with variance and mean linked by local regression, is proposed and an implementation, DESeq, as an R/Bioconductor package is presented.
Abstract: High-throughput sequencing assays such as RNA-Seq, ChIP-Seq or barcode counting provide quantitative readouts in the form of count data. To infer differential signal in such data correctly and with good statistical power, estimation of data variability throughout the dynamic range and a suitable error model are required. We propose a method based on the negative binomial distribution, with variance and mean linked by local regression and present an implementation, DESeq, as an R/Bioconductor package.

13,356 citations

Journal ArticleDOI
TL;DR: In this paper, the authors provide a conceptual framework that reflects the joint activities of risk assessment and risk mitigation that are fundamental to disruption risk management in supply chains, and consider empirical results from a rich data set covering the period 1995-2000 on accidents in the U. S. Chemical Industry.
Abstract: There are two broad categories of risk affecting supply chain design and management: (1) risks arising from the problems of coordinating supply and demand, and (2) risks arising from disruptions to normal activities. This paper is concerned with the second category of risks, which may arise from natural disasters, from strikes and economic disruptions, and from acts of purposeful agents, including terrorists. The paper provides a conceptual framework that reflects the joint activities of risk assessment and risk mitigation that are fundamental to disruption risk management in supply chains. We then consider empirical results from a rich data set covering the period 1995–2000 on accidents in the U. S. Chemical Industry. Based on these results and other literature, we discuss the implications for the design of management systems intended to cope with supply chain disruption risks.

1,771 citations

Journal ArticleDOI
TL;DR: This work proposes three methods based on the highest rank, the Borda count, and logistic regression for class set reranking that have been tested in applications of degraded machine-printed characters and works from large lexicons, resulting in substantial improvement in overall correctness.
Abstract: A multiple classifier system is a powerful solution to difficult pattern recognition problems involving large class sets and noisy input because it allows simultaneous use of arbitrary feature descriptors and classification procedures. Decisions by the classifiers can be represented as rankings of classifiers and different instances of a problem. The rankings can be combined by methods that either reduce or rerank a given set of classes. An intersection method and union method are proposed for class set reduction. Three methods based on the highest rank, the Borda count, and logistic regression are proposed for class set reranking. These methods have been tested in applications of degraded machine-printed characters and works from large lexicons, resulting in substantial improvement in overall correctness. >

1,703 citations

Journal ArticleDOI
TL;DR: An overview of statistical approaches to population association studies, including preliminary analyses (Hardy–Weinberg equilibrium testing, inference of phase and missing data, and SNP tagging), and single-SNP and multipoint tests for association.
Abstract: Although genetic association studies have been with us for many years, even for the simplest analyses there is little consensus on the most appropriate statistical procedures. Here I give an overview of statistical approaches to population association studies, including preliminary analyses (Hardy-Weinberg equilibrium testing, inference of phase and missing data, and SNP tagging), and single-SNP and multipoint tests for association. My goal is to outline the key methods with a brief discussion of problems (population structure and multiple testing), avenues for solutions and some ongoing developments.

1,429 citations

Journal ArticleDOI
TL;DR: A large class of models, including several generalizations of stochastic block models, as well as models parameterizing global tendencies towards clustering and centralization, and individual differences in such tendencies are described and extended.
Abstract: Spanning nearly sixty years of research, statistical network analysis has passed through (at least) two generations of researchers and models. Beginning in the late 1930's, the first generation of research dealt with the distribution of various network statistics, under a variety of null models. The second generation, beginning in the 1970's and continuing into the 1980's, concerned models, usually for probabilities of relational ties among very small subsets of actors, in which various simple substantive tendencies were parameterized. Much of this research, most of which utilized log linear models, first appeared in applied statistics publications. But recent developments in social network analysis promise to bring us into a third generation. The Markov random graphs of Frank and Strauss (1986) and especially the estimation strategy for these models developed by Strauss and Ikeda (1990; described in brief in Strauss, 1992), are very recent and promising contributions to this field. Here we describe a large class of models that can be used to investigate structure in social networks. These models include several generalizations of stochastic blockmodels, as well as models parameterizing global tendencies towards clustering and centralization, and individual differences in such tendencies. Approximate model fits are obtained using Strauss and Ikeda's (1990) estimation strategy. In this paper we describe and extend these models and demonstrate how they can be used to address a variety of substantive questions about structure in social networks.

1,250 citations