Submitted by bo_peng t3_1135aew in MachineLearning
csreid t1_j8p5z30 wrote
Reply to comment by farmingvillein in [R] RWKV-4 14B release (and ChatRWKV) - a surprisingly strong RNN Language Model by bo_peng
But they theoretically support infinite context length. Getting it is a problem to be solved, not a fundamental incompatibility like it is with transformers.
farmingvillein t1_j8p7lci wrote
Neither really work for super long contexts, so it is kind of a moot point.
Both--empirically--end up with bolt-on approaches to enhance memory over very long contexts, so it isn't really clear (a priori) that the RNN has a true advantage here.
Viewing a single comment thread. View all comments