A statistician called Kareem Carr created a twitter thread on statistics and bias and the epistemology in the tweets is bad. The text sez:

Want to know what kinds of bias are fixable with statistics and how?

This is a simple mental map of how different biases affect the process of using algorithms to make changes to the physical world. The way we can fix each bias is as follows…

– Data selection bias: you need an accurate mathematical model of the data creation process

– Statistical bias: you need good statistics

– Bias due to generalization: you need an accurate mathematical model of the observations in the data and in the target population

You need explanations of how your measurements work, of how you’re going to process the measurement results, and of what the measurement results imply for whatever you’re looking at. Carr doesn’t address the issue of how to create the explanations. The data don’t imply any particular explanation since nothing about a set of measurements tells you what the next measurement result will be. Rather, predictions are made using explanations, which are sometimes formalised in mathematical models.

To fix the “bias due to causal assumptions”, we need to fix all 3 smaller biases. At that point, if your model fits the data well then it should be a very close match to the world. In this case, correlation IS causation and we can say the inputs CAUSE the outputs.

Nothing about the fact that your model fits data implies that you’re right about the causation. The model could agree with reality for 200 years of testing and still fail because your theories about how to do observations was wrong, or because your theory about how the world works is wrong but it happens to agree with your results up to that time. That’s what happened with Newtonian mechanics, which agreed with every observation for 200 years.

It’s easy to find spurious correlations, there’s a website devoted to them. How do we know the difference between real and spurious correlations? All we know is whether the correlations are consistent with an alleged causal relationship. Statistics don’t imply any particular explanation. The statistics are either consistent with an explanation or not. If they are consistent the idea might be right, otherwise we have a problem. That problem may be solved by rejecting the theory or by rejecting the statistics.

Carr’s next tweet sez:

Because we understand causation in the model, we can investigate the degree to which different variables cause different outcomes and this opens the door to the theoretical investigation of the causal effects of gender and race on the model outcomes.

Causation isn’t the right way to think abut human beings. Human beings act according to their ideas. They then experience the consequences of those ideas. The consequences of the ideas people enact may be different from what they intend. Since a lot of people don’t understand economics, the consequences of their ideas are often very different from what they hope for. Healthcare that is “free” is paid for by taxation and inflation, which both reduce the creation of capital and that makes stuff more expensive. In addition having “free” healthcare is effectively like setting the price to zero by fiat, which increases the demand for healthcare. So “free” healthcare is more expensive and people demand more of it. Whether or not that is what people want when demanding “free” healthcare that is what they get and no amount of wishing otherwise will change what happens in reality.

It is possible that “free” healthcare policies could have different impacts on different groups of people. For example, if women have more health problems on average than men for biological reasons, their treatment might be worse on average by some measure like waiting longer for treatment because there is less capital for healthcare because of taxation and inflation. (I just made this idea up on the spot and I would be surprised if it was correct since I have made no effort to criticise it.) Statistics might be used to test some parts of such an explanation, like looking at rates of illness in women compared to men. But you’re not going to get the explanation from statistics.

For any given statistic that some people interpret as evidence of bias there will be other explanations. Statistics won’t resolve such disagreements only coming up with new explanations and criticisms will do that.