killver t1_iz0bw96 wrote on December 5, 2022 at 3:14 PM

Reply to comment by rahuldave in [D] Model comparison (train/test vs cross-validation) by Visual-Arm-7375

> But because of the hyperparameter optimization on them, the actual errors (like MSE) you calculate will be too optimistic.

This is the only argument for me to have a separate test dataset that you can make a more unbiased statement regarding accuracy. But I can promise you that no practicioner or researcher will set this test dataset apart and not make a decision on it, even if only subconsciously, which again biases it.

I think the better strategy is to focus on not making too optimistic statements on k-fold validation scores such as not doing automatic early stopping, not doing automatic learning rate schedulers, etc. The goal is to always only select hyperparameters that are optimal on all folds, vs. only optimal separate per fold.

rahuldave t1_iz0cb9o wrote on December 5, 2022 at 3:17 PM

100% agree with you on both points.

On the first one, the biasing of the test set, my point is, dont worry about it, the bias is minor.

On the second, YES to all you said. You WILL be overfitting otherwise. I like Machine Learning, philosophically this way: it is not optimization. Find something "good enough" and it is more likely your generalizability is safe, rather than find the tippy top optimum....