scispace - formally typeset
Search or ask a question
Posted Content

What Are We Weighting For

01 Feb 2013-Research Papers in Economics (National Bureau of Economic Research, Inc)-
TL;DR: Three distinct weighting motives are discussed: to achieve precise estimates by correcting for heteroskedasticity; to achieve consistent estimates by corrected for endogenous sampling; and to identify average partial effects in the presence of unmodeled heterogeneity of effects.
Abstract: The purpose of this paper is to help empirical economists think through when and how to weight the data used in estimation. We start by distinguishing two purposes of estimation: to estimate population descriptive statistics and to estimate causal effects. In the former type of research, weighting is called for when it is needed to make the analysis sample representative of the target population. In the latter type, the weighting issue is more nuanced. We discuss three distinct potential motives for weighting when estimating causal effects: (1) to achieve precise estimates by correcting for heteroskedasticity, (2) to achieve consistent estimates by correcting for endogenous sampling, and (3) to identify average partial effects in the presence of unmodeled heterogeneity of effects. In each case, we find that the motive sometimes does not apply in situations where practitioners often assume it does. We recommend diagnostics for assessing the advisability of weighting, and we suggest methods for appropriate inference.
Citations
More filters
Journal ArticleDOI
TL;DR: This work considers statistical inference for regression when data are grouped into clusters, with regression model errors independent across clusters but correlated within clusters, when the number of clusters is large and default standard errors can greatly overstate estimator precision.
Abstract: We consider statistical inference for regression when data are grouped into clus- ters, with regression model errors independent across clusters but correlated within clusters. Examples include data on individuals with clustering on village or region or other category such as industry, and state-year dierences-in-dierences studies with clustering on state. In such settings default standard errors can greatly overstate es- timator precision. Instead, if the number of clusters is large, statistical inference after OLS should be based on cluster-robust standard errors. We outline the basic method as well as many complications that can arise in practice. These include cluster-specic �xed eects, few clusters, multi-way clustering, and estimators other than OLS.

3,236 citations

Journal ArticleDOI
TL;DR: This paper showed that the two-way fixed effects estimator equals a weighted average of all possible two-group/two-period DD estimators in the data and decompose the difference between two specifications, and provide a new analysis of models that include time-varying controls.

1,414 citations

Journal ArticleDOI
TL;DR: Key features of DID designs are reviewed with an emphasis on public health policy research and it is noted that combining elements from multiple quasi-experimental techniques may be important in the next wave of innovations to the DID approach.
Abstract: The difference in difference (DID) design is a quasi-experimental research design that researchers often use to study causal relationships in public health settings where randomized controlled trials (RCTs) are infeasible or unethical. However, causal inference poses many challenges in DID designs. In this article, we review key features of DID designs with an emphasis on public health policy research. Contemporary researchers should take an active approach to the design of DID studies, seeking to construct comparison groups, sensitivity analyses, and robustness checks that help validate the method's assumptions. We explain the key assumptions of the design and discuss analytic tactics, supplementary analysis, and approaches to statistical inference that are often important in applied research. The DID design is not a perfect substitute for randomized experiments, but it often represents a feasible way to learn about casual relationships. We conclude by noting that combining elements from multiple quasi-experimental techniques may be important in the next wave of innovations to the DID approach.

789 citations

Journal ArticleDOI
TL;DR: In this paper, the authors estimate the effect of minimum wages on low-wage jobs using 138 prominent state-level minimum wage changes between 1979 and 2016 in the U.S using a dierence-in-dierences approach.
Abstract: We estimate the eect of minimum wages on low-wage jobs using 138 prominent state-level minimum wage changes between 1979 and 2016 in the U.S using a dierence-in-dierences approach. We first estimate the eect of the minimum wage increase on employment changes by wage bins throughout the hourly wage distribution. We then focus on the bottom part of the wage distribution and compare the number of excess jobs paying at or slightly above the new minimum wage to the missing jobs paying below it to infer the employment eect. We find that the overall number of low-wage jobs remained essentially unchanged over the five years following the increase. At the same time, the direct eect of the minimum wage on average earnings was amplified by modest wage spillovers at the bottom of the wage distribution. Our estimates by detailed demographic groups show that the lack of job loss is not explained by labor-labor substitution at the bottom of the wage distribution. We also find no evidence of disemployment when we consider higher levels of minimum wages. However, we do find some evidence of reduced employment in tradable sectors. We also show how decomposing the overall employment eect by wage bins allows a transparent way of assessing the plausibility of estimates.

449 citations

Journal ArticleDOI
TL;DR: MMLs have no discernible impact on drinking behavior for those aged 12-20, or the use of other psychoactive substances in either age group, but increase in the probability of current marijuana use, regular marijuana use and marijuana abuse/dependence among those aged 21 or above.

313 citations


Cites background from "What Are We Weighting For"

  • ...…stratification, which f the NSDUH sampling design would suppress the state-clustering adjustment. hen considering the choice between the two, Solon et al. (2013) noted that theoetically “neither strictly dominates the other (in identifying the population average ffect)” (Solon et al., 2013, p. 21)....

    [...]

  • ...…age 21 stratification, which f the NSDUH sampling design would suppress the state-clustering adjustment. hen considering the choice between the two, Solon et al. (2013) noted that theoetically “neither strictly dominates the other (in identifying the population average ffect)” (Solon et al., 2013,…...

    [...]

References
More filters
Journal ArticleDOI
TL;DR: In this article, a parameter covariance matrix estimator which is consistent even when the disturbances of a linear regression model are heteroskedastic is presented, which does not depend on a formal model of the structure of the heteroSkewedness.
Abstract: This paper presents a parameter covariance matrix estimator which is consistent even when the disturbances of a linear regression model are heteroskedastic. This estimator does not depend on a formal model of the structure of the heteroskedasticity. By comparing the elements of the new estimator to those of the usual covariance estimator, one obtains a direct test for heteroskedasticity, since in the absence of heteroskedasticity, the two estimators will be approximately equal, but will generally diverge otherwise. The test has an appealing least squares interpretation.

25,689 citations

MonographDOI
31 Jul 1997
TL;DR: Deaton as mentioned in this paper reviewed the analysis of household survey data, including the construction of household surveys, the econometric tools useful for such analysis, and a range of problems in development policy for which this survey analysis can be applied.
Abstract: Two decades after its original publication, The Analysis of Household Surveys is reissued with a new preface by its author, Sir Angus Deaton, recipient of the 2015 Nobel Prize in Economic Sciences. This classic work remains relevant to anyone with a serious interest in using household survey data to shed light on policy issues. This book reviews the analysis of household survey data, including the construction of household surveys, the econometric tools useful for such analysis, and a range of problems in development policy for which this survey analysis can be applied. The author's approach remains close to the data, using transparent econometric and graphical techniques to present data in a way that can clearly inform policy and academic debates. Chapter 1 describes the features of survey design that need to be understood in order to undertake appropriate analysis. Chapter 2 discusses the general econometric and statistical issues that arise when using survey data for estimation and inference. Chapter 3 covers the use of survey data to measure welfare, poverty, and distribution. Chapter 4 focuses on the use of household budget data to explore patterns of household demand. Chapter 5 discusses price reform, its effects on equity and efficiency, and how to measure them. Chapter 6 addresses the role of household consumption and saving in economic development. The book includes an appendix providing code and programs using STATA, which can serve as a template for the users' own analysis.

4,835 citations

Journal ArticleDOI
TL;DR: In this paper, a new class of semiparametric estimators, based on inverse probability weighted estimating equations, were proposed for parameter vector α 0 of the conditional mean model when the data are missing at random in the sense of Rubin and the missingness probabilities are either known or can be parametrically modeled.
Abstract: In applied problems it is common to specify a model for the conditional mean of a response given a set of regressors. A subset of the regressors may be missing for some study subjects either by design or happenstance. In this article we propose a new class of semiparametric estimators, based on inverse probability weighted estimating equations, that are consistent for parameter vector α0 of the conditional mean model when the data are missing at random in the sense of Rubin and the missingness probabilities are either known or can be parametrically modeled. We show that the asymptotic variance of the optimal estimator in our class attains the semiparametric variance bound for the model by first showing that our estimation problem is a special case of the general problem of parameter estimation in an arbitrary semiparametric model in which the data are missing at random and the probability of observing complete data is bounded away from 0, and then deriving a representation for the efficient score...

2,638 citations

Posted Content
TL;DR: The authors examined the effect of technological change and other factors on the relative demand for workers with different education levels and on the recent growth of U.S. educational wage differentials and found that the increase in demand shifts for more-skilled workers in the 1970s and 1980s relative to the 1960s is entirely accounted for by an increase in within- industry changes in skill utilization rather than between-industry employment shifts.
Abstract: This paper examines the effect of technological change and other factors on the relative demand for workers with different education levels and on the recent growth of U.S. educational wage differentials. A simple supply-demand framework is used to interpret changes in the relative quantities, wages, and wage bill shares of workers by education in the aggregate U.S. labor market in each decade since 1940 and over the 1990 to 1995 period. The results suggest that the relative demand for college graduates grew more rapidly on average during the past 25 years (1970-95) than during the previous three decades (1940-70). The increased rate of growth of relative demand for college graduates beginning in the 1970s did not lead to an increase in the college/high school wage differential until the 1980s because the growth in the supply of college graduates increased even more sharply in the 1970s before returning to historical levels in the 1980s. The acceleration in demand shifts for more-skilled workers in the 1970s and 1980s relative to the 1960s is entirely accounted for by an increase in within- industry changes in skill utilization rather than between-industry employment shifts. Industries with large increases in the rate of skill upgrading in the 1970s and 1980s versus the 1960s are those with greater growth in employee computer usage, more computer capital per worker, and larger shares of computer investment as a share of total investment. The results suggest that the spread of computer technology may "explain" as much as 30 to 50 percent of the increase in the rate of growth of the relative demand for more-skilled workers since 1970.

1,943 citations

Journal ArticleDOI
TL;DR: In this paper, a choice-based sampling process is proposed to estimate the parameters of a probabilistic choice model when choices rather than decision makers are sampled and the characteristics of the decision makers selecting those alternatives are observed.
Abstract: Ti-H CONCERN of this paper is the estimation of the parameters of a probabilistic choice model when choices rather than decision makers are sampled. Existing estimation methods presuppose an exogeneous sampling process, that is one in which a sequence of decision makers are drawn and their choice behaviors observed. In contrast, in choice based sampling processes, a sequence of chosen alternatives are drawn and the characteristics of the decision makers selecting those alternatives are observed. The problem of estimating a choice model from a choice based sample has suibstantive interest because data collection costs for such processes are often considerably smaller than for exogeneous sampling. Particular instances of this differential occur in the analysis of transportation behavior. For example, in studying choice of mode for work trips, it is often less expensive to survey transit users at the station and auto users at the parking lot than to interview commuters at their homes. Similarly, in examining choice of destination for shopping trips, surveys conducted at various shopping centers offer significant cost savings relative to home interviews.2 While interest in transportation applications provided the original motivation for our work, it has become apparent that choice based sampling processes can be cost effective in the analysis of numerous decision problems. In particular, wherever decision makers are physically clustered according to the alternatives they select, choice based sampling processes can achieve economies of scale not available with exogeneous sampling. Some non-transportation decision problems in which decision makers do cluster as described include the schooling decisions of students, the job decisions of workers, the medical care decisions of patients and the residential location decisions of households. Realization of the sampling cost benefits of choice based samples presupposes of course that the parameters of the underlying choice model can logically be inferred from such samples and that a tractable estimator with desirable statistical properties can be found. We shall, in this paper, confirm the logical supposition, develop a suitable estimator, and characterize the behavior of existing, exogeneous sampling, estimators in the context of choice based samples. An outline of the presentation and summary of major results follows.

1,304 citations

Trending Questions (1)
Why we use weighted mean in quantitative research paper?

Weighted mean is used in quantitative research papers to make the analysis sample representative of the target population and to achieve precise and consistent estimates.