cthorrez t1_j67csjx wrote on January 28, 2023 at 6:04 AM

Reply to comment by Complex_Candidate_28 in [R] Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers by currentscurrents

That's an interesting topic that I think deserves further investigation. On the surface it sounds like the size of the LM impacts the mechanism by which the LM is able to "secretly perform gradient descent".

Is finetuning similarly unstable for small sized LMs?

Complex_Candidate_28 t1_j67cx4a wrote on January 28, 2023 at 6:05 AM

Yes, the size also affects finetuning but much less sensitive.