Generalized Linear Models

Home
/
Papers
/
Generalized Linear Models

Book•

Generalized Linear Models

Peter McCullagh¹, John A. Nelder•Institutions (1)

01 Jan 1983-

TL;DR: In this paper, a generalization of the analysis of variance is given for these models using log- likelihoods, illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables), and gamma (variance components).

read less

Abstract: The technique of iterative weighted linear regression can be used to obtain maximum likelihood estimates of the parameters with observations distributed according to some exponential family and systematic effects that can be made linear by a suitable transformation. A generalization of the analysis of variance is given for these models using log- likelihoods. These generalized linear models are illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables) and gamma (variance components).

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

[...]

Michael I. Love¹, Michael I. Love², Wolfgang Huber, Simon Anders•Institutions (2)

Harvard University¹, Max Planck Society²

05 Dec 2014-Genome Biology

TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.

...read moreread less

Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html .

...read moreread less

47,038 citations

Book•

Econometric Analysis of Cross Section and Panel Data

[...]

Jeffrey M. Wooldridge

01 Jan 2001

TL;DR: This is the essential companion to Jeffrey Wooldridge's widely-used graduate text Econometric Analysis of Cross Section and Panel Data (MIT Press, 2001).

...read moreread less

Abstract: The second edition of this acclaimed graduate text provides a unified treatment of two methods used in contemporary econometric research, cross section and data panel methods. By focusing on assumptions that can be given behavioral content, the book maintains an appropriate level of rigor while emphasizing intuitive thinking. The analysis covers both linear and nonlinear models, including models with dynamics and/or individual heterogeneity. In addition to general estimation frameworks (particular methods of moments and maximum likelihood), specific linear and nonlinear methods are covered in detail, including probit and logit models and their multivariate, Tobit models, models for count data, censored and missing data schemes, causal (or treatment) effects, and duration analysis. Econometric Analysis of Cross Section and Panel Data was the first graduate econometrics text to focus on microeconomic data structures, allowing assumptions to be separated into population and sampling assumptions. This second edition has been substantially updated and revised. Improvements include a broader class of models for missing data problems; more detailed treatment of cluster problems, an important topic for empirical researchers; expanded discussion of "generalized instrumental variables" (GIV) estimation; new coverage (based on the author's own recent research) of inverse probability weighting; a more complete framework for estimating treatment effects with panel data, and a firmly established link between econometric approaches to nonlinear panel data and the "generalized estimating equation" literature popular in statistics and other fields. New attention is given to explaining when particular econometric methods can be applied; the goal is not only to tell readers what does work, but why certain "obvious" procedures do not. The numerous included exercises, both theoretical and computer-based, allow the reader to extend methods covered in the text and discover new insights.

...read moreread less

28,298 citations

Book•

Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers

[...]

Stephen Boyd¹, Neal Parikh¹, Eric Chu¹, Borja Peleato¹, Jonathan Eckstein² - Show less +1 more•Institutions (2)

Stanford University¹, Rutgers University²

23 May 2011

TL;DR: It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.

...read moreread less

Abstract: Many problems of recent interest in statistics and machine learning can be posed in the framework of convex optimization. Due to the explosion in size and complexity of modern datasets, it is increasingly important to be able to solve problems with a very large number of features or training examples. As a result, both the decentralized collection or storage of these datasets as well as accompanying distributed solution methods are either necessary or at least highly desirable. In this review, we argue that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas. The method was developed in the 1970s, with roots in the 1950s, and is equivalent or closely related to many other algorithms, such as dual decomposition, the method of multipliers, Douglas–Rachford splitting, Spingarn's method of partial inverses, Dykstra's alternating projections, Bregman iterative algorithms for l1 problems, proximal methods, and others. After briefly surveying the theory and history of the algorithm, we discuss applications to a wide variety of statistical and machine learning problems of recent interest, including the lasso, sparse logistic regression, basis pursuit, covariance selection, support vector machines, and many others. We also discuss general distributed optimization, extensions to the nonconvex setting, and efficient implementation, including some details on distributed MPI and Hadoop MapReduce implementations.

...read moreread less

17,433 citations

Journal Article•DOI•

Longitudinal data analysis using generalized linear models

[...]

Kung Yee Liang¹, Scott L. Zeger¹•Institutions (1)

Johns Hopkins University¹

01 Apr 1986-Biometrika

TL;DR: In this article, an extension of generalized linear models to the analysis of longitudinal data is proposed, which gives consistent estimates of the regression parameters and of their variance under mild assumptions about the time dependence.

...read moreread less

Abstract: SUMMARY This paper proposes an extension of generalized linear models to the analysis of longitudinal data. We introduce a class of estimating equations that give consistent estimates of the regression parameters and of their variance under mild assumptions about the time dependence. The estimating equations are derived without specifying the joint distribution of a subject's observations yet they reduce to the score equations for multivariate Gaussian outcomes. Asymptotic theory is presented for the general class of estimators. Specific cases in which we assume independence, m-dependence and exchangeable correlation structures from each subject are discussed. Efficiency of the proposed estimators in two simple situations is considered. The approach is closely related to quasi-likelih ood. Some key ironh: Estimating equation; Generalized linear model; Longitudinal data; Quasi-likelihood; Repeated measures.

...read moreread less

17,111 citations

Posted Content•DOI•

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

[...]

Michael I. Love¹, Wolfgang Huber, Simon Anders•Institutions (1)

Harvard University¹

17 Nov 2014-bioRxiv

...read moreread less

Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-Seq data, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data. DESeq2 uses shrinkage estimation for dispersions and fold changes to improve stability and interpretability of the estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression and facilitates downstream tasks such as gene ranking and visualization. DESeq2 is available as an R/Bioconductor package.

...read moreread less

17,014 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Book Chapter•DOI•

Analysis of variance

[...]

Brian Williams

28 Jul 2017

TL;DR: In this paper, the authors proposed a method to improve the quality of the data collected by the Japanese government through the use of data from the internet and social media, such as Facebook and Twitter.

...read moreread less

Abstract: 第１部１８８８年—１９０８年（幼年時代の空想；青年期の苦痛；「ジュリエット」自叙伝的実験；禁じられた愛；芸術の慰安；意志に対する感情）第２部１９０８年—１９１５年（ロンドンと二重生活；『ドイツの宿』の小説；孤独と危険；ミドルトン・マリと幼年時代のテーマ；夢に対する現実；役割演技）第３部１９１５—１９１８年（弟の死；「前奏曲」；小説としてのガーシントン；「私はフランス語が話せない」；悪夢の結婚）第４部１９１８年—１９２３年（「落着きを失った男」；母、父と「亡き大佐の娘たち」；愛よりもっと胸のおどる—正直；「入江のほとり」；死に悩まされる；追伸）

...read moreread less

1,382 citations

Journal Article•DOI•

Quasi-Likelihood Functions

[...]

Peter McCullagh

01 Mar 1983-Annals of Statistics

TL;DR: In this paper, the connection between quasi-likelihood functions, exponential family models and nonlinear weighted least squares is examined and consistency and asymptotic normality of the parameter estimates are discussed under second moment assumptions.

...read moreread less

Abstract: The connection between quasi-likelihood functions, exponential family models and nonlinear weighted least squares is examined. Consistency and asymptotic normality of the parameter estimates are discussed under second moment assumptions. The parameter estimates are shown to satisfy a property of asymptotic optimality similar in spirit to, but more general than, the corresponding optimal property of Gauss-Markov estimators.

...read moreread less

763 citations

Journal Article•DOI•

The GLIM System Release 3.

[...]

D. Collett, R. J. Baker, John A. Nelder

01 Jun 1979-Biometrics

701 citations

Journal Article•DOI•

Contingency tables with given marginals

[...]

C. T. Ireland¹, S. Kullback¹•Institutions (1)

George Washington University¹

01 Mar 1968-Biometrika

TL;DR: It is shown that the estimates are BAN, and that the iterative procedure is convergent, for a four-way contingency table for which the marginal probabilities pi and p j are known and fixed.

...read moreread less

Abstract: SUMMARY In its simplest formulation the problem considered is to estimate the cell probabilities pij Of an r x c contingency table for which the marginal probabilities pi and p j are known and fixed, so as to minimize E2pij In (Pi/r1ij), where rij are the corresponding entries in a given contingency table. An iterative procedure is given for determining the estimates and it is shown that the estimates are BAN, and that the iterative procedure is convergent. A summary of results for a four-way contingency table is given. An illustrative example is given.

...read moreread less

399 citations

Journal Article•DOI•

A New Analysis of Variance Model for Non-additive Data

[...]

John Mandel

01 Feb 1971-Technometrics

TL;DR: All advantage of the additive model will be lost, unless one can again partition the non-random portion of n7.i into functions of only one variable each.

...read moreread less

Abstract: (1971). A New Analysis of Variance Model for Non-additive Data. Technometrics: Vol. 13, No. 1, pp. 1-18.

...read moreread less

350 citations