ShadowStormDrift t1_ivfus80 wrote
Reply to comment by shumpitostick in The big data delusion – the more data we have, the harder it is to find meaningful patterns in the world. by IAI_Admin
What about confounding variables?
For example. Looking for trends across governments:hard. Looking for trends WITHIN government departments: Easier. (two different departments might trend in opposite directions and cancel each other out when pooled together)
shumpitostick t1_ivfwf6z wrote
That gets us into the realm of causal inference. This is not really what the author was talking about, but yes, it's a field that has a bunch of additional challenges. In this case, more data points might not help, but collecting data about additional variables might. In any case, getting more data will pretty much never cause your model to be worse.
ajt9000 t1_ivics5n wrote
The main way its gonna make a statistical model worse is by increasing the computational power needed to run it. Thats not an argument about the quality of the model results though. I agree the author's understanding of statistics is really bad.
shumpitostick t1_ivlb6zi wrote
I was oversimplifying my comments a bit. There is the curse of dimensionality. And in causal inference if you just use every variable as a confounder your model can also get worse because you're blocking forward paths. But if you know what you're doing it shouldn't be a problem. And I haven't met any ML practitioner or statistician who doesn't realize the importance of getting to understand your data and making proper modelling decisions.
Viewing a single comment thread. View all comments