IshKebab t1_j3j1gkz wrote on January 8, 2023 at 10:31 PM

Reply to comment by GoofAckYoorsElf in [P] I built Adrenaline, a debugger that fixes errors and explains them with GPT-3 by jsonathan

Yeah I imagine that will be an issue for lots of people. What's the SotA in open source LLMs?

I looked it up. Apparently it's BLOOM. Slightly bigger than GPT-3. No idea if it is better.

You need a DGX A100 to run it (only $150k!).

Soundwave_47 t1_j3k9npf wrote on January 9, 2023 at 3:33 AM

Anecdotally, it is comparable.

LetterRip t1_j3n91mt wrote on January 9, 2023 at 7:14 PM

I'd do GLM-130B

> With INT4 quantization, the hardware requirements can further be reduced to a single server with 4 * RTX 3090 (24G) with almost no performance degradation.

https://github.com/THUDM/GLM-130B

I'd also look into pruning/distillation and you could probably shrink the model by about half again.