# The social costs of gun ownership: Spurious regression and unfounded public policy advocacy

## Summary (4 min read)

### 6 Alternative specifications 19

- The association between guns and crime has been and continues to be a topic of intense debate in society at large and among social scientists.
- This finding sparked a furious academic debate across disciplines.
- A), which begins with the bold statement “There is A Proven Correlation between the Availability of Handguns and Incidents of Violence.” and then goes on to draw on findings from Duggan (2001).
- The original objective was to address specialized econometric problems, such as the noisy proxy used and truncation of the data due to the logarithmic model, and to also possibly confirm the results with five more years of data.

### 2 AN OVERVIEW 2

- This statistical property is the only reason C&L arrived at their result, based on which they advocated taxing gun ownership.
- If these findings are biased or spurious, any public policy based on them may not have its intended effect and in the worst case could actually be harmful.
- Indeed, only those readers interested in such a replication need read Subsections 3.1 and 3.2.
- The results from Cook and Ludwig (2006) are repeated in Sections 3.3 and 4.3, with a sharp twist in the results in Section 4.4 nullifying C&L’s original conclusion.
- The analysis assumes that gun ownership may impose externalities on society (Cook and Ludwig 2006: 379–380), specifically that more guns may result in more homicides.

### 2 AN OVERVIEW 3

- That proxy is the fraction of suicides committed with a firearm (Cook and Ludwig 2006: 380), that is, “firearm suicides” divided by “suicides” (FSS or FS/S).
- (b) is confirmed to function in the way intended by other studies (Azrael, Cook and Miller 2004; Kleck 2004), at least for the cross-section.
- Statistics on population size and number of homicides and suicides, as well as some sociodemographic controls for each county, are available.
- The panel of ratios is analyzed by a two-way (individual and time) fixed effects panel model on the logarithms.

### 3 DATA ACQUISITION 4

- And Cook and Ludwig (2003) without giving any reasons for doing so.
- Kleck (2009) has several criticisms, including (a) that C&L’s method of dealing with causal dependence is overly simple, (b) that the FSS proxy may not be valid for measuring trends in gun ownership, and, similar to Moody and Marvell’s argument, (c) that the controls used are arbitrarily chosen and that some possible necessary controls are missing from the model.
- The aggregated data set used in Cook and Ludwig (2006) is not published and the authors chose not to share it with me.
- 4The set of selected counties does not change if the 1990 census population from United States Census Bureau (1990) is used instead.
- 5United States Department of Health and Human Services.

### 3 DATA ACQUISITION 5

- The total population is taken from Table P3; the respective number of households from P5.
- 10For detailed download procedures see Westphal (2013).

### 3 DATA ACQUISITION 6

- These contain reported crime numbers aggregated at the county level.
- Study dataset 0654512 for the 1993 Uniform Crime Report data was not available for download at the time of writing.
- Different geographical coding schemes are found in the data: NCHS17 coding and FIPS18 coding.
- These changes may lead to mismatched assignment of values if ignored during data extraction; thus each data source and each year had to be individually checked for such changes.

### 3 DATA ACQUISITION 8

- Index, and nchs and fips, and the tuple of state and county as interchangeable individual identifiers for the counties.
- I then used these numbers to calculate the percentages with the appropriate denominator.
- Switching the denominator to either total or UCRpop changes the results only marginally from those reported below; correlation between pop and total is > 0.999 and correlation between pop and UCRpop is > 0.989.3.3 Comparison of Des riptives.

### 4 REGRESSION ANALYSIS 9

- This column allows comparing my data to those of C&L while at the same time giving the correct descriptives.
- Differences may be due to slightly different data sources,20 a slightly different set of observations used for computation,21 or, possibly, revised data.
- They include the proxy FSS = E955/E95 lagged by one year to circumvent possible reverse causation, i.e., people buying guns because of a higher homicide rate.

### 4 REGRESSION ANALYSIS 11

- Dependent variable Y is the homicide rate E96/pop.
- There are several ways of excluding observations containing a ratio of zero: unbalance the panel or remove counties or years (whichever is less costly) in order to keep the panel balanced.
- Results for this and those presented in the following sections are qualitatively the same and numerically close when using (various subsets of) the unbalanced panel.
- Descriptive statistics do not differ much from those set out in Table 4.
- The results are only slightly different from the results in Cook and Ludwig (2006: Table 2, final column).

### 4 REGRESSION ANALYSIS 12

- Table 3, model 3), who needed weighting to achieve significance on β1, significance on the balanced panel is achieved without weighting, also known as and Ludwig (2006.
- The within R2 reported in Table 5 is magnitudes smaller than the R2 of around 0.9 reported by Cook and Ludwig (2006: Table 2) for all their models.
- The coefficient on the female household heads changes sign between the original study and my estimation, but this does not affect the arguments in Sections 4.4, 5 or 6.
- To understand what happens when the authors estimate the first difference model, the estimating equation (2) needs to be written out in full.

### 5 DISCUSSION 14

- Values do not exhibit too much orthogonal variation to population itself, it will be able to explain itself.
- This is a variant of the ratio fallacy, first discovered by Pearson (1896) and discussed in detail by Kronmal (1993),29 which here appears disguised in a logarithmic model.
- 31 Together with Table 6 this shows that all other values vary much more strongly over time than does the value based on population.
- 30Computed by analysis of variance decomposition of variance: within-county variance is variance over time, between-county variance is variance between counties.

### 5 DISCUSSION 15

- Of interest β1(∆ ln E955k,t−1−∆ ln E95k,t−1), the term from the numerator is double the mean squared distance from zero and double the variance than the term from the denominator.
- This means that in this specific data set, taking the first differences at least partially removes the numbers causing spurious correlations between ratios.
- Here, it basically removes population from all the terms and only the (growth rates) of the numerators remain in the model after taking first differences.
- One could now argue E95k,t−1 is not population and therefore the results from Cook and Ludwig (2006) are not due to the ratio fallacy.
- When the authors look at the correlation matrix (Table 7) they immediately see that the correlation between suicides and population is far superior to any other correlation between the lefthand side and the right-hand side, at least in regard to the four variables shown in Table 7.

### 5 DISCUSSION 16

- It seems unlikely that none of these situations occurred in the original analysis, and thus there may be more sources for spurious results than just the ratio problem.
- Therefore, not all trends will be accounted for in C&L’s original model.
- This is further evidence that the differentiated model has in this case taken the ratio problem out of the data.
- 32Using the original denominators and testing all four linear hypotheses leads to an even more significant rejection of the null hypothesis.
- The estimation result from this model is set out in the column labeled “Eq. (17)” in Table 9.

### 6 ALTERNATIVE SPECIFICATIONS 20

- Any spurious correlation between the left-hand side and the right-hand side due to population appearing on both sides is no longer possible.
- Time series problems are not accounted for.
- 3 Risk Model Duggan (2003: 48–50) proposes a model38 for explaining individual i’s suicide decision39 Pr = α+ X iθ + γGuni + λi + ǫi (20) with X i being individual observable controls, Guni a dummy for gun ownership, andλi individual i’s unobserved individual propensity to commit suicide.
- The following argument holds for other monotonous link functions as well.

### 6 ALTERNATIVE SPECIFICATIONS 22

- Under the simplifying assumption that only gun owners are able to commit suicide by firearm.
- Now attribute an additional risk to each gun owner,42 then a relation of β0 + β1,1 β0 ∼ RRgunowner, (25) exists, given the number of firearm suicides is somehow linked to the number of gun owners.
- For burglaries, it might just mean there are around 170 times as many burglaries as homicides.
- This model is susceptible to criticism for obvious heteroscedasticity across counties with different levels of population.

### 6 ALTERNATIVE SPECIFICATIONS 23

- Where Xk,t contains the log growth rates for the controls.
- Due to nearly perfect correlation with pop−1 k,t , I removed pop5plus−1 k,t and households−1 k,t from the model.
- Including pop−1 k,t on the right-hand side is obviously ridiculous.
- This is the only setting showing this result.

### 7 CONCLUSION 24

- Thus, from a goodness-of-fit point of view, the only thing C&L’s full model does, is add a lot of noise to Equation (28).
- “The Latest Misfires in Support of the ’More Guns, Less Crime’ Hypothesis.” Stanford Law Review, 55: 1371.
- Cook, Philip J, Jens Ludwig, Sudhir Venkatesh, and Anthony A Braga.

Did you find this useful? Give us your feedback

##### Citations

147 citations

##### References

272,030 citations

28,298 citations

10,363 citations

5,922 citations

### "The social costs of gun ownership: ..." refers background in this paper

...Regression between time series is known to produce spurious results in the following settings: trending or auto correlated time series (Granger and Newbold 1974), I(1) processes without drift (Phillips 1986), I(1) processes with further stationary regressors (Hassler 1996), stationary AR processes…...

[...]

3,763 citations

### "The social costs of gun ownership: ..." refers methods in this paper

...The lack of efficiency can be dealt with after estimation by applying Driscoll and Kraay (1998) via Croissant and Millo’s (2008) v ovSCC function.25 Contrary to Cook 22An example would be ∑ k dk = 0....

[...]

##### Related Papers (5)

##### Frequently Asked Questions (8)

###### Q2. What is the first method for removing the spuriousness from C&L’s model?

The first method for removing the spuriousness from C&L’s model is given by Kronmal (1993: 390): include the inverse of the deflating variable as an explanatory variable on the right-hand side.

###### Q3. What happens when the individual fixed effects disappear from the model?

The individual fixed effects disappear from the model, the time fixed effects are transformed to the differences between the time fixed effects, and the errors are transformed.

###### Q4. What is the reason why the original study fell prey to the ratio fallacy?

All the “explanatory power” (goodness-of-fit-wise and significance-wise) of the original analysis was due to regional and intertemporal differences and population being explained by itself.

###### Q5. What is the main argument for the analysis?

The analysis assumes that gun ownership may impose externalities on society (Cook and Ludwig 2006: 379–380), specifically that more guns may result in more homicides.

###### Q6. What is the weighting used by Cook and Ludwig?

25By applying weighting to account for heteroscedasticity (Cook and Ludwig 2006: 382) and calculating standard errors that are robust to heteroscedasticity (Cook and Ludwig 2006: 382), C&L basically “double correct” for heteroscedasticity.

###### Q7. How does the author analyze the data?

The authors use the advanced method of panel analysis and analyze a comprehensive data set covering 200 U.S. counties and 20 years.

###### Q8. What is the way to standardize without using ratios?

Also multicollinearity might be an issue, as all numbers used are part of the population and therefore are in popk,t .6.4 Growth Model A way to standardize without using ratios is to use growth rates.