LetterRip t1_jbkdshr wrote on March 9, 2023 at 5:53 PM

Reply to comment by farmingvillein in [D] Why isn't everyone using RWKV if it's so much better than transformers? by ThePerson654321

Here is what the author stated in the thread,

> Tape-RNNs are really good (both in raw performance and in compression i.e. very low amount of parameters) but they just can't absorb the whole internet in a reasonable amount of training time... We need to find a solution to this!

I think they knew it existed (ie they knew there was a deeplearning project named RWKV), but they appear to have not know it met their scaling needs.

farmingvillein t1_jbkx0co wrote on March 9, 2023 at 7:53 PM

I don't understand the relevance here--tape-RNNs != RWKV, unless I misunderstand the RWKV architecture (certainly possible).