Viewing a single comment thread. View all comments

XtremePocket t1_ir0xpjw wrote on October 4, 2022 at 3:18 PM

Mu transfer has (sort of) a theoretically guaranteed way of transferring the optimal hyperparameters of scaled down versions of a model to it. I haven’t tried it in practice, but maybe give that a try?