Viewing a single comment thread. View all comments

currentscurrents t1_j7xv6j3 wrote

Stats is tremendously useful, especially when your dataset is small by ML standards. Basically every scientific paper relies on statistics to tell you whether or not their result is meaningful.

ML is great when you have millions of data points, but when you only have a hundred it's not going to help you.

6

[deleted] t1_j7y325j wrote

[deleted]

−1

currentscurrents t1_j7y4073 wrote

>Right now basically all progress is with large models,

You mean all progress... in machine learning. A lot of scientific fields necessarily must make do with a smaller number of data points.

You can't test a new drug on a million people, especially in early phase trials. Even outside of medicine, you may have very few samples if you're studying a rare phenomena.

Statistics gives you tools to make limited conclusions from small samples, and also measure how meaningful those conclusions actually are.

6

[deleted] t1_j7y67bi wrote

[deleted]

0

[deleted] t1_j7y9mjs wrote

[deleted]

1

WikiSummarizerBot t1_j7y9nn5 wrote

All models are wrong

>All models are wrong is a common aphorism in statistics; it is often expanded as "All models are wrong, but some are useful". The aphorism acknowledges that statistical models always fall short of the complexities of reality but can still be useful nonetheless. The aphorism originally referred just to statistical models, but it is now sometimes used for scientific models in general. The aphorism is generally attributed to the statistician George Box.

^([ )^(F.A.Q)^( | )^(Opt Out)^( | )^(Opt Out Of Subreddit)^( | )^(GitHub)^( ] Downvote to remove | v1.5)

1

Jemimas_witness t1_j7y68en wrote

This is only correct for certain problems, like everything it has best use cases. When you only have a hammer everything looks like a nail.

In medicine the backbone of clinical trial results that change the field relies often on 2000-3000 patients (datapoints) and often groundbreaking achievements in medical practice are made by simple statistics and simple methods. Go to the New England journal of medicine and pick any trial and the weight of their conclusions are based off of survival functions, hazard ratios, and chi squared statistics. Then go look at the funding section - these projects are funded by millions. The only disciplines in medicine with ML datapoints are epidemiology and claims level data which strays way into econometrics.

I myself study rare diseases as well as AI/ML applications in medicine and for some projects I’d be stoked to get 80 patients because there just simply aren’t that many around.

2