Currently I am taking a course in applied statistics and I’m amazed at how little justification or thought there goes into causal claims arising from statistics.

First, it occurs to me that statistics has nothing to say about the relationship between the actual things that produce the data. The statistical models are indifferent to the actual things. For our purposes, the things are labels put to numbers which indicate something to us but not to the model. The model in itself cannot say whether there is any relationship between the things, but only the numbers and even that is limited.

Consider that we might produce the same numbers from some other cause. In fact given the vast number of things out there, we certainly can find something follows along a similar quantitative track within what statistical significance we might set. There is in fact no need that there be any causal relationship, but only that the numbers be statistically similar. The model does not care one way or the other what the things are, because the essence of the things finds no place in the model, only the quantitative values we have measured or derived.

Second, my text and professor attempts to avoid this issue by making reference to theory and intuition. This is an attempt to add some premises to the model in order to justify the things chosen. The intuition is that this at least seems probable as a cause of the thing. However, on what basis is this intuition rationally justified. It is in fact taken from the consideration of what we know concerning the essence of the thing. This intuition is uncertain and requires more information and hence our model.

This seems to largely be a non-starter. I think this might be a cause but I’m unsure. If the model gives statistically significant results, then we have some more evidence that it might be. However, the model gives no rational justification that the thing is a cause, because our justification for it possibly being a cause is based on our intuition. This is founded is an uncertain knowledge of the essence of thing and as we have already concluded the model does not penetrate to the essence of the thing.

Third, the talk concerning hypothesis tests appears almost completely unjustified. We talk about “rejecting the null,” however even by the terms of frequentist statistics this is not at all what is being done. Some model arrives at a predicted coefficient of some sort. We get a corresponding p-value based on some test on that coefficient. This p-value doesn’t tell us to “reject the null” even by its own standard. It tells us a certain probability of getting a certain value given a certain condition. At best this tells us not to reject the null but do further research and investigate more deeply.