Generalized Linear Models

Home
/
Papers
/
Generalized Linear Models

Book•

Generalized Linear Models

Peter McCullagh¹, John A. Nelder•Institutions (1)

01 Jan 1983-

TL;DR: In this paper, a generalization of the analysis of variance is given for these models using log- likelihoods, illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables), and gamma (variance components).

read less

Abstract: The technique of iterative weighted linear regression can be used to obtain maximum likelihood estimates of the parameters with observations distributed according to some exponential family and systematic effects that can be made linear by a suitable transformation. A generalization of the analysis of variance is given for these models using log- likelihoods. These generalized linear models are illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables) and gamma (variance components).

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

[...]

Michael I. Love¹, Michael I. Love², Wolfgang Huber, Simon Anders•Institutions (2)

Harvard University¹, Max Planck Society²

05 Dec 2014-Genome Biology

TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.

...read moreread less

Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html .

...read moreread less

47,038 citations

Book•

Econometric Analysis of Cross Section and Panel Data

[...]

Jeffrey M. Wooldridge

01 Jan 2001

TL;DR: This is the essential companion to Jeffrey Wooldridge's widely-used graduate text Econometric Analysis of Cross Section and Panel Data (MIT Press, 2001).

...read moreread less

Abstract: The second edition of this acclaimed graduate text provides a unified treatment of two methods used in contemporary econometric research, cross section and data panel methods. By focusing on assumptions that can be given behavioral content, the book maintains an appropriate level of rigor while emphasizing intuitive thinking. The analysis covers both linear and nonlinear models, including models with dynamics and/or individual heterogeneity. In addition to general estimation frameworks (particular methods of moments and maximum likelihood), specific linear and nonlinear methods are covered in detail, including probit and logit models and their multivariate, Tobit models, models for count data, censored and missing data schemes, causal (or treatment) effects, and duration analysis. Econometric Analysis of Cross Section and Panel Data was the first graduate econometrics text to focus on microeconomic data structures, allowing assumptions to be separated into population and sampling assumptions. This second edition has been substantially updated and revised. Improvements include a broader class of models for missing data problems; more detailed treatment of cluster problems, an important topic for empirical researchers; expanded discussion of "generalized instrumental variables" (GIV) estimation; new coverage (based on the author's own recent research) of inverse probability weighting; a more complete framework for estimating treatment effects with panel data, and a firmly established link between econometric approaches to nonlinear panel data and the "generalized estimating equation" literature popular in statistics and other fields. New attention is given to explaining when particular econometric methods can be applied; the goal is not only to tell readers what does work, but why certain "obvious" procedures do not. The numerous included exercises, both theoretical and computer-based, allow the reader to extend methods covered in the text and discover new insights.

...read moreread less

28,298 citations

Book•

Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers

[...]

Stephen Boyd¹, Neal Parikh¹, Eric Chu¹, Borja Peleato¹, Jonathan Eckstein² - Show less +1 more•Institutions (2)

Stanford University¹, Rutgers University²

23 May 2011

TL;DR: It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.

...read moreread less

Abstract: Many problems of recent interest in statistics and machine learning can be posed in the framework of convex optimization. Due to the explosion in size and complexity of modern datasets, it is increasingly important to be able to solve problems with a very large number of features or training examples. As a result, both the decentralized collection or storage of these datasets as well as accompanying distributed solution methods are either necessary or at least highly desirable. In this review, we argue that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas. The method was developed in the 1970s, with roots in the 1950s, and is equivalent or closely related to many other algorithms, such as dual decomposition, the method of multipliers, Douglas–Rachford splitting, Spingarn's method of partial inverses, Dykstra's alternating projections, Bregman iterative algorithms for l1 problems, proximal methods, and others. After briefly surveying the theory and history of the algorithm, we discuss applications to a wide variety of statistical and machine learning problems of recent interest, including the lasso, sparse logistic regression, basis pursuit, covariance selection, support vector machines, and many others. We also discuss general distributed optimization, extensions to the nonconvex setting, and efficient implementation, including some details on distributed MPI and Hadoop MapReduce implementations.

...read moreread less

17,433 citations

Journal Article•DOI•

Longitudinal data analysis using generalized linear models

[...]

Kung Yee Liang¹, Scott L. Zeger¹•Institutions (1)

Johns Hopkins University¹

01 Apr 1986-Biometrika

TL;DR: In this article, an extension of generalized linear models to the analysis of longitudinal data is proposed, which gives consistent estimates of the regression parameters and of their variance under mild assumptions about the time dependence.

...read moreread less

Abstract: SUMMARY This paper proposes an extension of generalized linear models to the analysis of longitudinal data. We introduce a class of estimating equations that give consistent estimates of the regression parameters and of their variance under mild assumptions about the time dependence. The estimating equations are derived without specifying the joint distribution of a subject's observations yet they reduce to the score equations for multivariate Gaussian outcomes. Asymptotic theory is presented for the general class of estimators. Specific cases in which we assume independence, m-dependence and exchangeable correlation structures from each subject are discussed. Efficiency of the proposed estimators in two simple situations is considered. The approach is closely related to quasi-likelih ood. Some key ironh: Estimating equation; Generalized linear model; Longitudinal data; Quasi-likelihood; Repeated measures.

...read moreread less

17,111 citations

Posted Content•DOI•

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

[...]

Michael I. Love¹, Wolfgang Huber, Simon Anders•Institutions (1)

Harvard University¹

17 Nov 2014-bioRxiv

...read moreread less

Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-Seq data, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data. DESeq2 uses shrinkage estimation for dispersions and fold changes to improve stability and interpretability of the estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression and facilitates downstream tasks such as gene ranking and visualization. DESeq2 is available as an R/Bioconductor package.

...read moreread less

17,014 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Book Chapter•DOI•

Maximum Likelihood in Three-Way Contingency Tables

[...]

Martin W. Birch¹•Institutions (1)

University of Liverpool¹

01 Jan 1963-Journal of the royal statistical society series b-methodological

TL;DR: In this article, maximum likelihood estimation for many-way and three-way contingency tables is discussed and the solutions given for threeway tables in the cases of greatest interest are given.

...read moreread less

Abstract: Interactions in three-way and many-way contingency tables arc defined as certain linear combinations of the logarithms of the expected frequencies. Maximum-likelihood estimation is discussed for many-way tables and the solutions given for three-way tables in the cases of greatest interest.

...read moreread less

332 citations

Journal Article•DOI•

Inverse Polynomials, a Useful Group of Multi-Factor Response Functions

[...]

J. A. Nelder

01 Mar 1966-Biometrics

297 citations

Journal Article•DOI•

A Reformulation of Linear Models

[...]

J. A. Nelder¹•Institutions (1)

The Hertz Corporation¹

01 Jan 1977

TL;DR: In this article, a reformulation of linear models is proposed to integrate finite and infinite populations, random and fixed effects, excess and deficit of variance, to avoid unnecessary constraints on parameters, and to lead naturally to interesting hypotheses about the model terms.

...read moreread less

Abstract: SUMMARY Dissatisfaction is expressed with aspects of the current exposition of linear models, including the neglect of marginality, unnecessary differences between models for finite and infinite populations, failure to distinguish different kinds of random terms, impositon of unnecessary and inconsistent constraints on parameters, and lack of an adequate notation for negative components of variance. The reformulation, exemplified for crossed and nested classifications of balanced data, and for simple orthogonal designed experiments, is designed to integrate finite and infinite populations, random and fixed effects, excess and deficit of variance, to avoid unnecessary constraints on parameters, and to lead naturally to interesting hypotheses about the model terms.

...read moreread less

292 citations

Book•

Analysing qualitative data

[...]

Albert Ernest Maxwell

01 Jan 1964

261 citations

Journal Article•DOI•

Association Models and Canonical Correlation in the Analysis of Cross-Classifications Having Ordered Categories

[...]

Leo A. Goodman¹•Institutions (1)

University of Chicago¹

01 Jun 1981-Journal of the American Statistical Association

TL;DR: In this paper, the authors compared the results obtained using association models with those obtained using the earlier canonical correlation approach for cross-classification having ordered categories and showed that association models can provide meaningful scores for the row and column categories and these scores can be used to partition into relevant components the usual chi-squared statistic for testing the null hypothesis of statistical independence between the row classification and column classification.

...read moreread less

Abstract: The association models considered in Goodman (1979a) for the analysis of cross-classifications having ordered categories are presented in a somewhat different form in the present article to facilitate comparison of the results obtained using these models with those obtained using the earlier canonical correlation approach. Both the association models and the canonical correlation approach can provide meaningful scores for the row and column categories, and these scores can be used to partition into relevant components the usual chi-squared statistic for testing the null hypothesis of statistical independence between the row classification and column classification. However, while the usual procedure for testing the statistical significance of chi-squared components is invalid when these components are based on the canonical correlations, a corresponding procedure is valid when these components are obtained with the association models. The components of association obtained with the association mo...

...read moreread less

235 citations