bo_peng OP t1_j6gnqrp wrote
Reply to comment by Gody_Godee in [P] RWKV 14B Language Model & ChatRWKV : pure RNN (attention-free), scalable and parallelizable like Transformers by bo_peng
2006.16236 is bad at any nontrivial task such as language modeling.
Viewing a single comment thread. View all comments