guardiantesla t1_iwum4fl wrote on November 18, 2022 at 1:58 PM

Reply to [R] RWKV-4 7B release: an attention-free RNN language model matching GPT-J performance (14B training in progress) by bo_peng

Interesting work. Appreciate your effort. There are few works which use convolutions as well (referred as ConFormer). But I’m not sure if it has been tried in comparing with GPT works.

How do you train such large models (AWS, GCP, etc)? And how much is the estimated cost?