currentscurrents OP t1_j658kmf wrote
Reply to comment by cthorrez in [R] Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers by currentscurrents
Interesting. That probably explains why ICL outperformed finetuning by so much in their experiments.
Viewing a single comment thread. View all comments