alexander-prince OP t1_j5xl9nz wrote on January 26, 2023 at 6:40 AM

Reply to comment by Luckbot in ELI5: What is Overfitting in machine learning and why is it bad? by alexander-prince

So if I gave it the 100 pics without stopping criteria and then tested it on a much larger data set, say, 1000 pics, will the detection error increase or is this not considered overfitting since it trained on a dataset and then used on a completely different set?

Luckbot t1_j5xlr4y wrote on January 26, 2023 at 6:46 AM

It would not recognize those and that's exactly overfitting, learning ONLY it's dataset, but not the pattern within the dataset that is general and can be applied to new data.

If this happens does also depend on how complex your ML model is though (compared to the amount of input data). The simpler it is, the more resistant it is to overfitting (but also the less complex the pattern is allowed to be).

There is a scientist joke: "If you want to perfectly fit a linear regression just give it 2 datapoints". The linear regression is pretty much the simplest model, but giving it a too small dataset makes even that useless.

random_web_browser t1_j5xlz98 wrote on January 26, 2023 at 6:49 AM

If it was overfitted like discussed before it wouldn't recognice those 1000 pictures, because it wouldn't actually know what a cat is but just know exactly the 100 pictures you first gave it. This is exactly overfitting you are fitting the data into 100 pictures and not into detecting Cats, so any new data that you give doesn't work.

That is why you take 80 pictures from the 100 and test the algorithm with the remaining 20 to make sure it detects cats and doesn't overfit into those 80 pictures