DreamMidnight t1_jchxtfy wrote on March 16, 2023 at 11:17 PM

Reply to comment by LeN3rd in [D] Simple Questions Thread by AutoModerator

Yes, although I am specifically looking into the reasoning of "at least 10 datapoints per variable."

What is the mathematical reasoning of this minimum?

LeN3rd t1_jcislrk wrote on March 17, 2023 at 3:11 AM

I have not heard this before. Where is it from? I know that you should have more datapoints than parameters in classical models.

DreamMidnight t1_jcrh53z wrote on March 19, 2023 at 12:10 AM

Here are some sources:

https://home.csulb.edu/~msaintg/ppa696/696regmx.htm

https://developers.google.com/machine-learning/data-prep/construct/collect/data-size-quality (order of magnitude in this case means 10)

https://stats.stackexchange.com/questions/163055/clarification-on-the-rule-of-10-for-logistic-regression

LeN3rd t1_jct6arv wrote on March 19, 2023 at 11:14 AM

Ok, so all of these are linear ( logistics) regression models, for which it makes sense to have more data points, because the weights aren't as constraint as in a convolutional layer I.e. but it is still a rule of thumb, not exactly a proof.

VS2ute t1_jd1irhb wrote on March 21, 2023 at 3:43 AM

If you have random noise on a variable, it can have a substantial effect when too few samples.