fixed points in data

For far too long now, whenever someone shares my “graphics matter” series of posts, someone pipes up with a response that can be summarized as:

“We don’t know how many firearms there are in America, so we cannot draw any correlation or causation conclusions between firearms and the people they kill.”

It is time I address this notion.

The first phrase of the sentence is actually correct.  Not only are firearms a durable good – meaning that firearms produced before the Revolutionary War could still be used today so long as they were properly maintained – but national registration of non-NFA-regulated firearms is outlawed by the Firearm Owners Protection Act.

As well it should be – the government simply has no business knowing what I own.

“But that means the second phrase of the sentence has to be correct, right?”

Well, no.

The entire point of the “graphics matter” series is to examine the correlation of the number of firearms in America with the number of firearm-related fatalities or crimes in America, as well as the per-population rate of the same.  The fun thing about correlating two data sets is that you do not need to know their starting points.

“Wait, what?”

It is one of the fundamental aspects of correlation, really. “Positive correlation” means that “as data set A increases, data set B also increases”. “Negative correlation” means that “as data set A increases, data set B also decreases”.

(An important distinction I want to make before moving on is that this does not mean A’s increase causes B’s increase, or vice versa, or anything of the sort.  The world is full of correlations that have no causal relationship with one another.)

The reason the definitions are significant is this – you are looking at the rate of change. The “slope”, for those of you who remember… what was that, high school algebra?

But rate of change – slope – is determined between two points, and is completely independent of Y-intercept, or any starting point. As long as point 1 is separate from point 2 by the expected difference, it doesn’t actually matter what the individual values are.

In other words, the slope between the X/Y data points of 1/100 and 2/200, and the slope between the X/Y data points of 1/0 and 2/100 are exactly the same, despite the values being different.

It is kind of whacky to think about, but in most-basic terms, 2x+10 has the same slope as 2x+100, but entirely different values, and both correlate against 4x in exactly the same way (they both have a correlation value of 1 with respect to 4x).

In fact, since this is a graphically-related site, let us look at an example.

chart

Here we have a plot of 2X + 10, 2X + 100, 4X, and X raised to the power of 1.5 over time.

As I mentioned above, both 2X + 10 and 2X + 100 correlate to 4X with a coefficient of 1, meaning that as the first two equations increase, the third equation also increases, and the ratio of the increases is always the same.  This makes sense – all three are straight lines, meaning their slope is constant along their lengths, so comparisons between those slopes will always be equal.

However, both of the first two equations correlate with X^1.5 with a coefficient of 0.99052.  Why?  Well, the slope of the fourth equation changes over time, since it involves a power.  The first two equations do not have the pronounced curve of the fourth, so their growth does not mirror the fourth’s growth, no matter how different that growth might be (as when comparing the first two equations with the third).  However, all three equations are increasing over time, hence their very strong correlation (coefficients can range from 1 to -1).

But the point – that I am perhaps belaboring – is that the first two equations have the exact same correlation with any other equation or line you care to throw on the chart with them.

Why does this matter?

We really do not have any idea how many firearms are in America.  We never will.  I use the 2003 Small Arms Survey as the basis for my “graphics matter” series because it was the most-current when I started the post series, and changing reference points midway through is generally bad.

But I literally could have started in 1981 with the (atrociously flawed) assumption that there were no firearms in America, and the math would still work out exactly the same.

“Uh… why?”

We will never know how many firearms there are in America at any given time, but we do have a very accurate accounting of how many firearms are produced and imported into the country every year.  The BATFE’s Firearms Commerce in the United States Annual Statistical Update provides us that data back to 1986, and then the Shooting Industry News covers the remainder.  How can this be so accurate?  National registries may be outlawed, but all new firearms commercially produced must be uniquely serialized, and must be declared to the BATFE at the end of the year.  The penalties for “fudging” numbers are… severe.

We have the yearly production data.  Which means we have the rate of change – the slope.

Likewise, we have a… noticeably less-accurate, but still-considered-reliable accounting of the American population and the number of Americans who were killed by other people using firearms at the CDC WISQARS Fatal Injury Report.  I refer to this as “less-accurate”, because I have personally witnessed the CDC correcting data five years past; while I would prefer accurate data over leaving the inaccurate data, it annoys me that, for example, they got the American population wrong by 300,000 residents in one year.

Having to go back and update my data aside, we have the yearly numbers of firearm-related fatalities, which means we can calculate the rate of change.

In other words, I – or you – can compare the two-year-paired slopes for each of those data sets, or the average slope as a whole, or any other combination, and it simply does not matter where the firearms data started.  Only the differences between each year’s data matters, and we have those differences tallied by “authoritative” sources.

Feel free to play with the situation yourself; I have uploaded the spreadsheet for the above graphic, and you can fiddle with the numbers to see how things change.

We genuinely have no idea how many firearms there are in America, and that is fine.  We do know how many have been produced a year for the past ~35 years, and the only correlation between the change in firearms in America and the change in firearm-related fatalities is negative-to-non-existent, for both raw numbers and per-American rates.  Thus, “more guns = more deaths” cannot be true.

Leave a comment