Submitted by bo_peng t3_1135aew in MachineLearning
bo_peng OP t1_j8qhiyk wrote
Reply to comment by farmingvillein in [R] RWKV-4 14B release (and ChatRWKV) - a surprisingly strong RNN Language Model by bo_peng
RWKV is the exception. When you look at loss against token position, it is comparable with transformers.
You can tell that from the generation results too.
farmingvillein t1_j8qj1u7 wrote
> RWKV is the exception. When you look at loss against token position, it is comparable with transformers.
Can you link to what you are referring to? If I missed it in the OP post, my apologies.
Viewing a single comment thread. View all comments