my general feeling is even if it works, it will take more steps to get to the same loss than backprop, which in some sense cancel out the hardware advantage of the forward forward setting. I tried that on gpt and wikitext it just doesn’t converge on real problems, maybe something crucial is still missing.
Lord_of_Many_Memes t1_j69sr57 wrote
Reply to [D] Could forward-forward learning enable training large models with distributed computing? by currentscurrents
my general feeling is even if it works, it will take more steps to get to the same loss than backprop, which in some sense cancel out the hardware advantage of the forward forward setting. I tried that on gpt and wikitext it just doesn’t converge on real problems, maybe something crucial is still missing.