Submitted by osedao t3_11ayjxt in MachineLearning
BrohammerOK t1_j9wvrl7 wrote
Reply to comment by osedao in [D] Is validation set necessary for non-neural network models, too? by osedao
You can work with 2 splits, which is a common practice. For a small dataset you can use 5 or 10 fold crossvalidation with shuffling on 75-80% of the dataset (train) for hyperparameter tunning / model selection, fit the best model on the entirety of that set, and then evaluate/test on the remaining 25%-20% that you held out. You can repeat the process multiple times with different seeds to get a better estimation of the expected performance, assuming that the input data when you do inference comes from the same distribution as your dataset.
BrohammerOK t1_j9ww6yx wrote
If you wanna use something like early stopping, though, you'll have no choice but to use 3 splits.
Viewing a single comment thread. View all comments