Viewing a single comment thread. View all comments

LetterRip t1_j8a436a wrote

I'd go with RWKV, clever architecture that allows training an RNN like a normal transformer model.

https://github.com/BlinkDL/RWKV-LM

You can use a quantized variant to run larger models on modest hardware (int8 or mixed int8/int4 has been shown to work well with LLMs).

1