Viewing a single comment thread. View all comments

XtremePocket t1_ir0xpjw wrote

Mu transfer has (sort of) a theoretically guaranteed way of transferring the optimal hyperparameters of scaled down versions of a model to it. I haven’t tried it in practice, but maybe give that a try?

3