bo_peng OP t1_iwua2xh wrote
Reply to comment by Competitive-Rub-1958 in [R] RWKV-4 7B release: an attention-free RNN language model matching GPT-J performance (14B training in progress) by bo_peng
RWKV 7B is faster than GPT 6B, and RWKV scales great actually :)
If you check the table, RWKV is better than GPT-neo on everything at 3B (while smaller RWKV lags behind on LAMBADA).
But GPT-J is using rotary and thus quite better than GPT-neo, so I expect RWKV to surpass it at 14B.
Moreover RWKV 3B becomes stronger after trained for more tokens and I am doing it for the 7B model too.
Viewing a single comment thread. View all comments