Viewing a single comment thread. View all comments

LetterRip t1_j8a436a wrote on February 12, 2023 at 8:09 PM

I'd go with RWKV, clever architecture that allows training an RNN like a normal transformer model.

You can use a quantized variant to run larger models on modest hardware (int8 or mixed int8/int4 has been shown to work well with LLMs).