Interesting work. Appreciate your effort. There are few works which use convolutions as well (referred as ConFormer). But I’m not sure if it has been tried in comparing with GPT works.
How do you train such large models (AWS, GCP, etc)? And how much is the estimated cost?
guardiantesla t1_iwum4fl wrote
Reply to [R] RWKV-4 7B release: an attention-free RNN language model matching GPT-J performance (14B training in progress) by bo_peng
Interesting work. Appreciate your effort. There are few works which use convolutions as well (referred as ConFormer). But I’m not sure if it has been tried in comparing with GPT works.
How do you train such large models (AWS, GCP, etc)? And how much is the estimated cost?