scispace - formally typeset
Search or ask a question
Book

Generalized Linear Models

01 Jan 1983-
TL;DR: In this paper, a generalization of the analysis of variance is given for these models using log- likelihoods, illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables), and gamma (variance components).
Abstract: The technique of iterative weighted linear regression can be used to obtain maximum likelihood estimates of the parameters with observations distributed according to some exponential family and systematic effects that can be made linear by a suitable transformation. A generalization of the analysis of variance is given for these models using log- likelihoods. These generalized linear models are illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables) and gamma (variance components).
Citations
More filters
Journal ArticleDOI
TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html .

47,038 citations

Book
01 Jan 2001
TL;DR: This is the essential companion to Jeffrey Wooldridge's widely-used graduate text Econometric Analysis of Cross Section and Panel Data (MIT Press, 2001).
Abstract: The second edition of this acclaimed graduate text provides a unified treatment of two methods used in contemporary econometric research, cross section and data panel methods. By focusing on assumptions that can be given behavioral content, the book maintains an appropriate level of rigor while emphasizing intuitive thinking. The analysis covers both linear and nonlinear models, including models with dynamics and/or individual heterogeneity. In addition to general estimation frameworks (particular methods of moments and maximum likelihood), specific linear and nonlinear methods are covered in detail, including probit and logit models and their multivariate, Tobit models, models for count data, censored and missing data schemes, causal (or treatment) effects, and duration analysis. Econometric Analysis of Cross Section and Panel Data was the first graduate econometrics text to focus on microeconomic data structures, allowing assumptions to be separated into population and sampling assumptions. This second edition has been substantially updated and revised. Improvements include a broader class of models for missing data problems; more detailed treatment of cluster problems, an important topic for empirical researchers; expanded discussion of "generalized instrumental variables" (GIV) estimation; new coverage (based on the author's own recent research) of inverse probability weighting; a more complete framework for estimating treatment effects with panel data, and a firmly established link between econometric approaches to nonlinear panel data and the "generalized estimating equation" literature popular in statistics and other fields. New attention is given to explaining when particular econometric methods can be applied; the goal is not only to tell readers what does work, but why certain "obvious" procedures do not. The numerous included exercises, both theoretical and computer-based, allow the reader to extend methods covered in the text and discover new insights.

28,298 citations

Book
23 May 2011
TL;DR: It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.
Abstract: Many problems of recent interest in statistics and machine learning can be posed in the framework of convex optimization. Due to the explosion in size and complexity of modern datasets, it is increasingly important to be able to solve problems with a very large number of features or training examples. As a result, both the decentralized collection or storage of these datasets as well as accompanying distributed solution methods are either necessary or at least highly desirable. In this review, we argue that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas. The method was developed in the 1970s, with roots in the 1950s, and is equivalent or closely related to many other algorithms, such as dual decomposition, the method of multipliers, Douglas–Rachford splitting, Spingarn's method of partial inverses, Dykstra's alternating projections, Bregman iterative algorithms for l1 problems, proximal methods, and others. After briefly surveying the theory and history of the algorithm, we discuss applications to a wide variety of statistical and machine learning problems of recent interest, including the lasso, sparse logistic regression, basis pursuit, covariance selection, support vector machines, and many others. We also discuss general distributed optimization, extensions to the nonconvex setting, and efficient implementation, including some details on distributed MPI and Hadoop MapReduce implementations.

17,433 citations

Journal ArticleDOI
TL;DR: In this article, an extension of generalized linear models to the analysis of longitudinal data is proposed, which gives consistent estimates of the regression parameters and of their variance under mild assumptions about the time dependence.
Abstract: SUMMARY This paper proposes an extension of generalized linear models to the analysis of longitudinal data. We introduce a class of estimating equations that give consistent estimates of the regression parameters and of their variance under mild assumptions about the time dependence. The estimating equations are derived without specifying the joint distribution of a subject's observations yet they reduce to the score equations for multivariate Gaussian outcomes. Asymptotic theory is presented for the general class of estimators. Specific cases in which we assume independence, m-dependence and exchangeable correlation structures from each subject are discussed. Efficiency of the proposed estimators in two simple situations is considered. The approach is closely related to quasi-likelih ood. Some key ironh: Estimating equation; Generalized linear model; Longitudinal data; Quasi-likelihood; Repeated measures.

17,111 citations

Posted ContentDOI
17 Nov 2014-bioRxiv
TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-Seq data, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data. DESeq2 uses shrinkage estimation for dispersions and fold changes to improve stability and interpretability of the estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression and facilitates downstream tasks such as gene ranking and visualization. DESeq2 is available as an R/Bioconductor package.

17,014 citations

References
More filters
Book ChapterDOI
28 Jul 2017
TL;DR: In this paper, the authors proposed a method to improve the quality of the data collected by the Japanese government through the use of data from the internet and social media, such as Facebook and Twitter.
Abstract: 第1部 1888年—1908年(幼年時代の空想;青年期の苦痛;「ジュリエット」自叙伝的実験;禁じられた愛;芸術の慰安;意志に対する感情) 第2部 1908年—1915年(ロンドンと二重生活;『ドイツの宿』の小説;孤独と危険;ミドルトン・マリと幼年時代のテーマ;夢に対する現実;役割演技) 第3部 1915—1918年(弟の死;「前奏曲」;小説としてのガーシントン;「私はフランス語が話せない」;悪夢の結婚) 第4部 1918年—1923年(「落着きを失った男」;母、父と「亡き大佐の娘たち」;愛よりもっと胸のおどる—正直;「入江のほとり」;死に悩まされる;追伸)

1,382 citations

Journal ArticleDOI
TL;DR: In this paper, the connection between quasi-likelihood functions, exponential family models and nonlinear weighted least squares is examined and consistency and asymptotic normality of the parameter estimates are discussed under second moment assumptions.
Abstract: The connection between quasi-likelihood functions, exponential family models and nonlinear weighted least squares is examined. Consistency and asymptotic normality of the parameter estimates are discussed under second moment assumptions. The parameter estimates are shown to satisfy a property of asymptotic optimality similar in spirit to, but more general than, the corresponding optimal property of Gauss-Markov estimators.

763 citations

Journal ArticleDOI

701 citations

Journal ArticleDOI
TL;DR: It is shown that the estimates are BAN, and that the iterative procedure is convergent, for a four-way contingency table for which the marginal probabilities pi and p j are known and fixed.
Abstract: SUMMARY In its simplest formulation the problem considered is to estimate the cell probabilities pij Of an r x c contingency table for which the marginal probabilities pi and p j are known and fixed, so as to minimize E2pij In (Pi/r1ij), where rij are the corresponding entries in a given contingency table. An iterative procedure is given for determining the estimates and it is shown that the estimates are BAN, and that the iterative procedure is convergent. A summary of results for a four-way contingency table is given. An illustrative example is given.

399 citations

Journal ArticleDOI
TL;DR: All advantage of the additive model will be lost, unless one can again partition the non-random portion of n7.i into functions of only one variable each.
Abstract: (1971). A New Analysis of Variance Model for Non-additive Data. Technometrics: Vol. 13, No. 1, pp. 1-18.

350 citations