scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Generalized Linear Models (2nd ed.)

01 May 1992-Technometrics (Taylor & Francis Group)-Vol. 34, Iss: 2, pp 223-224
About: This article is published in Technometrics.The article was published on 1992-05-01. It has received 1484 citations till now. The article focuses on the topics: Generalized linear mixed model & Generalized additive model.
Citations
More filters
Journal ArticleDOI
Hui Zou1
TL;DR: A new version of the lasso is proposed, called the adaptive lasso, where adaptive weights are used for penalizing different coefficients in the ℓ1 penalty, and the nonnegative garotte is shown to be consistent for variable selection.
Abstract: The lasso is a popular technique for simultaneous estimation and variable selection. Lasso variable selection has been shown to be consistent under certain conditions. In this work we derive a necessary condition for the lasso variable selection to be consistent. Consequently, there exist certain scenarios where the lasso is inconsistent for variable selection. We then propose a new version of the lasso, called the adaptive lasso, where adaptive weights are used for penalizing different coefficients in the l1 penalty. We show that the adaptive lasso enjoys the oracle properties; namely, it performs as well as if the true underlying model were given in advance. Similar to the lasso, the adaptive lasso is shown to be near-minimax optimal. Furthermore, the adaptive lasso can be solved by the same efficient algorithm for solving the lasso. We also discuss the extension of the adaptive lasso in generalized linear models and show that the oracle properties still hold under mild regularity conditions. As a bypro...

6,765 citations


Additional excerpts

  • ...The generic density form can be written as (McCullagh and Nelder 1989) f (y|x, θ) = h(y) exp(yθ − φ(θ))....

    [...]

  • ...The generic density form can be written as (McCullagh and Nelder 1989) f (y|x, θ) = h(y) exp(yθ − φ(θ))....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the gravity equation for trade was used to provide new estimates of this equation, and significant differences between the estimated estimator and those obtained with the traditional method were found.
Abstract: Although economists have long been aware of Jensen's inequality, many econometric applications have neglected an important implication of it: the standard practice of interpreting the parameters of log-linearized models estimated by ordinary least squares as elasticities can be highly misleading in the presence of heteroskedasticity. This paper explains why this problem arises and proposes an appropriate estimator. Our criticism to conventional practices and the solution we propose extends to a broad range of economic applications where the equation under study is log-linearized. We develop the argument using one particular illustration, the gravity equation for trade, and apply the proposed technique to provide new estimates of this equation. We find significant differences between estimates obtained with the proposed estimator and those obtained with the traditional method. These discrepancies persist even when the gravity equation takes into account multilateral resistance terms or fixed effects

4,492 citations

Journal ArticleDOI
TL;DR: In this paper, generalized linear mixed models (GLMM) are used to estimate the marginal quasi-likelihood for the mean parameters and the conditional variance for the variances, and the dispersion matrix is specified in terms of a rank deficient inverse covariance matrix.
Abstract: Statistical approaches to overdispersion, correlated errors, shrinkage estimation, and smoothing of regression relationships may be encompassed within the framework of the generalized linear mixed model (GLMM). Given an unobserved vector of random effects, observations are assumed to be conditionally independent with means that depend on the linear predictor through a specified link function and conditional variances that are specified by a variance function, known prior weights and a scale factor. The random effects are assumed to be normally distributed with mean zero and dispersion matrix depending on unknown variance components. For problems involving time series, spatial aggregation and smoothing, the dispersion may be specified in terms of a rank deficient inverse covariance matrix. Approximation of the marginal quasi-likelihood using Laplace's method leads eventually to estimating equations based on penalized quasilikelihood or PQL for the mean parameters and pseudo-likelihood for the variances. Im...

4,317 citations


Cites background or methods from "Generalized Linear Models (2nd ed.)..."

  • ...…0 for the canonical link functions, for which This content downloaded by the authorized user from 192.168.52.71 on Tue, 27 Nov 2012 12:04:25 PM All use subject to JSTOR Terms and Conditions Breslow and Clayton: Approximate Inference for GLMM's 11 g'(,) = v ̀ (A) (McCullagh and Nelder 1989, p. 32)....

    [...]

  • ...…- 1.37 2.02 .62 *9QC Covariance - - -.25 1.52 -.12 o 9fC Model C: Separate effects each experiment Summer '86 1.37 .70 2.35 .14 1.41 .09 FaIl '86 rerun .98 .60 2.99 1.42 1.26 .62 Fall '86 .40 1.34 .33 2.89 .26 1.50 McCullagh and Nelder (1989), table 14.10. wKarim and Zeger (1992), table 3, medians....

    [...]

  • ...Standard GLM techniques (McCullagh and Nelder 1989, sec....

    [...]

  • ...Differentiating a ain with respect o b, we have n zizi K`(b) + D-l + R Z'WZ + D-', (4) where W is the n X n diagonal matrix with diagonal terms wi= {aiv(Ai )[g(Aib )]2 } -1 that are recognizable as the GLM iterated weights (Firth 1991, p. 63; McCullagh and Nelder 1989, sec....

    [...]

  • ...Because of the balanced design, their quasi-likelihood estimates of the regression parameters were identical to those obtained from standard logistic regression under independence (McCullagh and Nelder 1989, ex....

    [...]

Posted Content
01 Jan 2001
TL;DR: This paper gives a lightning overview of data mining and its relation to statistics, with particular emphasis on tools for the detection of adverse drug reactions.
Abstract: The growing interest in data mining is motivated by a common problem across disciplines: how does one store, access, model, and ultimately describe and understand very large data sets? Historically, different aspects of data mining have been addressed independently by different disciplines. This is the first truly interdisciplinary text on data mining, blending the contributions of information science, computer science, and statistics. The book consists of three sections. The first, foundations, provides a tutorial overview of the principles underlying data mining algorithms and their application. The presentation emphasizes intuition rather than rigor. The second section, data mining algorithms, shows how algorithms are constructed to solve specific problems in a principled manner. The algorithms covered include trees and rules for classification and regression, association rules, belief networks, classical statistical models, nonlinear models such as neural networks, and local "memory-based" models. The third section shows how all of the preceding analysis fits together when applied to real-world data mining problems. Topics include the role of metadata, how to handle missing data, and data preprocessing.

3,765 citations

Posted Content
TL;DR: It is shown that more efficient sampling designs exist for making valid inferences, such as sampling all available events and a tiny fraction of nonevents, which enables scholars to save as much as 99% of their (nonfixed) data collection costs or to collect much more meaningful explanatory variables.
Abstract: We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros ("nonevents"). In many literatures, these variables have proven difficult to explain and predict, a problem that seems to have at least two sources. First, popular statistical procedures, such as logistic regression, can sharply underestimate the probability of rare events. We recommend corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. Second, commonly used data collection strategies are grossly inefficient for rare events data. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables, such as in international conflict data with more than a quarter-million dyads, only a few of which are at war. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all variable events (e.g., wars) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99% of their (nonfixed) data collection costs or to collect much more meaningful explanatory variables. We provide methods that link these two results, enabling both types of corrections to work simultaneously, and software that implements the methods developed.

3,170 citations


Additional excerpts

  • ...7 We begin with McCullagh and Nelder’s (1989) analytical approximations, but we focus on rare events....

    [...]

References
More filters
Journal ArticleDOI
Hui Zou1
TL;DR: A new version of the lasso is proposed, called the adaptive lasso, where adaptive weights are used for penalizing different coefficients in the ℓ1 penalty, and the nonnegative garotte is shown to be consistent for variable selection.
Abstract: The lasso is a popular technique for simultaneous estimation and variable selection. Lasso variable selection has been shown to be consistent under certain conditions. In this work we derive a necessary condition for the lasso variable selection to be consistent. Consequently, there exist certain scenarios where the lasso is inconsistent for variable selection. We then propose a new version of the lasso, called the adaptive lasso, where adaptive weights are used for penalizing different coefficients in the l1 penalty. We show that the adaptive lasso enjoys the oracle properties; namely, it performs as well as if the true underlying model were given in advance. Similar to the lasso, the adaptive lasso is shown to be near-minimax optimal. Furthermore, the adaptive lasso can be solved by the same efficient algorithm for solving the lasso. We also discuss the extension of the adaptive lasso in generalized linear models and show that the oracle properties still hold under mild regularity conditions. As a bypro...

6,765 citations

Journal ArticleDOI
TL;DR: In this paper, the gravity equation for trade was used to provide new estimates of this equation, and significant differences between the estimated estimator and those obtained with the traditional method were found.
Abstract: Although economists have long been aware of Jensen's inequality, many econometric applications have neglected an important implication of it: the standard practice of interpreting the parameters of log-linearized models estimated by ordinary least squares as elasticities can be highly misleading in the presence of heteroskedasticity. This paper explains why this problem arises and proposes an appropriate estimator. Our criticism to conventional practices and the solution we propose extends to a broad range of economic applications where the equation under study is log-linearized. We develop the argument using one particular illustration, the gravity equation for trade, and apply the proposed technique to provide new estimates of this equation. We find significant differences between estimates obtained with the proposed estimator and those obtained with the traditional method. These discrepancies persist even when the gravity equation takes into account multilateral resistance terms or fixed effects

4,492 citations

Journal ArticleDOI
TL;DR: In this paper, generalized linear mixed models (GLMM) are used to estimate the marginal quasi-likelihood for the mean parameters and the conditional variance for the variances, and the dispersion matrix is specified in terms of a rank deficient inverse covariance matrix.
Abstract: Statistical approaches to overdispersion, correlated errors, shrinkage estimation, and smoothing of regression relationships may be encompassed within the framework of the generalized linear mixed model (GLMM). Given an unobserved vector of random effects, observations are assumed to be conditionally independent with means that depend on the linear predictor through a specified link function and conditional variances that are specified by a variance function, known prior weights and a scale factor. The random effects are assumed to be normally distributed with mean zero and dispersion matrix depending on unknown variance components. For problems involving time series, spatial aggregation and smoothing, the dispersion may be specified in terms of a rank deficient inverse covariance matrix. Approximation of the marginal quasi-likelihood using Laplace's method leads eventually to estimating equations based on penalized quasilikelihood or PQL for the mean parameters and pseudo-likelihood for the variances. Im...

4,317 citations

Posted Content
01 Jan 2001
TL;DR: This paper gives a lightning overview of data mining and its relation to statistics, with particular emphasis on tools for the detection of adverse drug reactions.
Abstract: The growing interest in data mining is motivated by a common problem across disciplines: how does one store, access, model, and ultimately describe and understand very large data sets? Historically, different aspects of data mining have been addressed independently by different disciplines. This is the first truly interdisciplinary text on data mining, blending the contributions of information science, computer science, and statistics. The book consists of three sections. The first, foundations, provides a tutorial overview of the principles underlying data mining algorithms and their application. The presentation emphasizes intuition rather than rigor. The second section, data mining algorithms, shows how algorithms are constructed to solve specific problems in a principled manner. The algorithms covered include trees and rules for classification and regression, association rules, belief networks, classical statistical models, nonlinear models such as neural networks, and local "memory-based" models. The third section shows how all of the preceding analysis fits together when applied to real-world data mining problems. Topics include the role of metadata, how to handle missing data, and data preprocessing.

3,765 citations

Posted Content
TL;DR: It is shown that more efficient sampling designs exist for making valid inferences, such as sampling all available events and a tiny fraction of nonevents, which enables scholars to save as much as 99% of their (nonfixed) data collection costs or to collect much more meaningful explanatory variables.
Abstract: We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros ("nonevents"). In many literatures, these variables have proven difficult to explain and predict, a problem that seems to have at least two sources. First, popular statistical procedures, such as logistic regression, can sharply underestimate the probability of rare events. We recommend corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. Second, commonly used data collection strategies are grossly inefficient for rare events data. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables, such as in international conflict data with more than a quarter-million dyads, only a few of which are at war. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all variable events (e.g., wars) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99% of their (nonfixed) data collection costs or to collect much more meaningful explanatory variables. We provide methods that link these two results, enabling both types of corrections to work simultaneously, and software that implements the methods developed.

3,170 citations