[R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) Submitted by bo_peng t3_11teywc on March 17, 2023 at 2:49 AM in MachineLearning 32 comments 101
[R] RWKV (100% RNN) can genuinely model ctx4k+ documents in Pile, and RWKV model+inference+generation in 150 lines of Python Submitted by bo_peng t3_11iwt1b on March 5, 2023 at 1:11 PM in MachineLearning 26 comments 63
[P] ChatRWKV v2 (can run RWKV 14B with 3G VRAM), RWKV pip package, and finetuning to ctx16K Submitted by bo_peng t3_11f9k5g on March 1, 2023 at 5:23 PM in MachineLearning 37 comments 89
[R] RWKV-4 14B release (and ChatRWKV) - a surprisingly strong RNN Language Model Submitted by bo_peng t3_1135aew on February 15, 2023 at 6:44 PM in MachineLearning 37 comments 268
[P] RWKV 14B Language Model & ChatRWKV : pure RNN (attention-free), scalable and parallelizable like Transformers Submitted by bo_peng t3_10eh2f3 on January 17, 2023 at 4:54 PM in MachineLearning 19 comments 110
[R] RWKV-4 7B release: an attention-free RNN language model matching GPT-J performance (14B training in progress) Submitted by bo_peng t3_yxt8sa on November 17, 2022 at 3:32 PM in MachineLearning 22 comments 172