Complex_Candidate_28 t1_j67cx4a wrote
Reply to comment by cthorrez in [R] Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers by currentscurrents
Yes, the size also affects finetuning but much less sensitive.
Viewing a single comment thread. View all comments