murrdpirate
murrdpirate t1_jbd9755 wrote
Reply to comment by KD_A in [D] To Make Your Model Better, First Figure Out What's Wrong by pgao_aquarium
>It does not tell you about any other factors which modulate model complexity.
Can you expand on that? My general understanding is that if I'm seeing significantly lower training losses than validation losses, then my model complexity is too high compared to the data (unless there's something wrong with the data).
murrdpirate t1_jb7xg74 wrote
Reply to comment by deephugs in [D] I’m a Machine Learning Engineer for FAANG companies. What are some places looking for freelance / contract work for ML? by doctorjuice
Yeah I think the issues you mention are probably pretty true. As a client, I'm often restricted to using US freelancers, so my experience may not be typical. But I have often found that experts are generally worth their higher rates.
murrdpirate t1_jb7n8ls wrote
Reply to comment by z_fi in [D] I’m a Machine Learning Engineer for FAANG companies. What are some places looking for freelance / contract work for ML? by doctorjuice
As someone who's done just a bit of freelance work on Upwork, and a ton of client work, what don't you like about Upwork?
murrdpirate t1_j087lji wrote
Reply to comment by Internal-Diet-514 in [P] Implemented Vision Transformers 🚀 from scratch using TensorFlow 2.x by TensorDudee
>I think overfitting would still happen, but we’d still get better validation performance.
I think by definition, overfitting means your validation performance decreases (or at least does not increase).
>So maybe VIT for cifar-10 didn’t add any additional capabilities that were worth it for the problem, just additional complexity
Depends on what you mean by "the problem." The problem could be:
- Get the best possible performance on CIFAR-10 Test
- Get the best possible performance on CIFAR-10 Test, but only train on CIFAR-10 Train
Even if it was the second one, you could likely just reduce the complexity of the VIT model and have it outperform other models. Or keep it the same, but use heavy regularization during training.
murrdpirate t1_j07k4v2 wrote
Reply to comment by Internal-Diet-514 in [P] Implemented Vision Transformers 🚀 from scratch using TensorFlow 2.x by TensorDudee
I don't think "worse" is a clear description. The issue is just that it's too complex for CIFAR-10 alone. Any model can be increased in complexity until it overfits, and thus performs worse.
A model that doesn't overfit on CIFAR-10 is unlikely to benefit from pretraining on other datasets. Unless somehow the other datasets are more closely aligned to CIFAR-10 Test than CIFAR-10 Train is.
murrdpirate t1_jben5uy wrote
Reply to comment by KD_A in [D] To Make Your Model Better, First Figure Out What's Wrong by pgao_aquarium
>Notice that "significantly lower" can't actually be defined.
True. I guess I would say that over-fitting is a spectrum, and that there's generally some amount of over-fitting happening (unless your training set happens to be significantly more challenging than your test set). So the bigger the gap between train and test, the more over-fitting.
>It's tempting to think "test error is 3x train error, we're overfitting". This may or may not be right; there absolutely could be a (more complex) model B with, e.g., training error rate 0.05, test error rate 0.27.
Maybe it's semantics, but in my view, I would say model B is indeed overfitting "more" than model A. But I don't think more overfitting guarantees worse test results, it just increases the likelihood of worse test results due to increased variance. I may still choose to deploy model B, but I would view it as a highly overfitting model that happened to perform well.
Appreciate the response. I also liked your CrossValidated post. I've wondered about that issue myself. Do you think data augmentation should also be disabled in that test?