jamesvoltage
jamesvoltage t1_j8yjrqo wrote
Reply to comment by MysteryInc152 in [R] RWKV-4 14B release (and ChatRWKV) - a surprisingly strong RNN Language Model by bo_peng
State space models (S4, H3, etc) are also competitive with 2B param transformer language models and have an effectively infinite context window https://hazyresearch.stanford.edu/blog/2023-01-20-h3
jamesvoltage t1_jajjsh3 wrote
Reply to comment by Kaleidophon in [D] backprop through beam sampling ? by SaltyStackSmasher
The nano chat GPT repository extended with Gumbel softmax https://github.com/sanjeevanahilan/nanoChatGPT