Ford_O t1_iwtrw98 wrote on November 18, 2022 at 7:52 AM

How much faster is RNN on inference than GPTJ?

bo_peng OP t1_iwts867 wrote on November 18, 2022 at 7:56 AM

RWKV-3 1.5B on A40 (tf32) = always 0.015 sec/token, tested using simple pytorch code (no CUDA), GPU utilization 45%, VRAM 7823M

GPT2-XL 1.3B on A40 (tf32) = 0.032 sec/token (for ctxlen 1000), tested using HF, GPU utilization 45% too (interesting), VRAM 9655M

Moreover RWKV-4 is bf16 and faster than 16bit GPT models.

Training speed: RWKV-4 1.5B BF16 ctxlen1024 = 106K tokens/s on 8xA100 40G.

Could you also measure the performance on CPU?

So again. What is the disadvantage with using your method?