Submitted by ThePerson654321 t3_11lq5j4 in MachineLearning
LetterRip t1_jbkdshr wrote
Reply to comment by farmingvillein in [D] Why isn't everyone using RWKV if it's so much better than transformers? by ThePerson654321
Here is what the author stated in the thread,
> Tape-RNNs are really good (both in raw performance and in compression i.e. very low amount of parameters) but they just can't absorb the whole internet in a reasonable amount of training time... We need to find a solution to this!
I think they knew it existed (ie they knew there was a deeplearning project named RWKV), but they appear to have not know it met their scaling needs.
farmingvillein t1_jbkx0co wrote
I don't understand the relevance here--tape-RNNs != RWKV, unless I misunderstand the RWKV architecture (certainly possible).
Viewing a single comment thread. View all comments