currentscurrents OP t1_j627rd0 wrote on January 27, 2023 at 4:34 AM

Reply to comment by master3243 in [R] Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers by currentscurrents

Meh, transformers have been around for like 5 years and nobody figured this out until now.

I think this mostly speaks to how hard it is to figure out what neural networks are doing. Complexity is irrelevant to the training process (or any other optimization process), so the algorithms they implement are arbitrarily complex.

(or in practice, as arbitrarily complex as the model size and dataset size allow)

master3243 t1_j62aoln wrote on January 27, 2023 at 5:01 AM

You're right they've been around for 5 years (and the idea for attention even before that) but almost every major conference still has new papers coming out giving more insight into transformers (and sometimes algorithms/methods older than it)

I just don't want to see titles flooded with terms like "secretly" or "hidden" or "mysterious", I feel it replaces scientific terms with less scientific but more eye-catchy ones.

Again I totally understand why they would choose this phrasing, and I probably would too, but in a blog post title not a research paper title.

But once again, the actual work seems great and that's all that matters really.