Submitted by jsonathan t3_106q6m9 in MachineLearning
IshKebab t1_j3j1gkz wrote
Reply to comment by GoofAckYoorsElf in [P] I built Adrenaline, a debugger that fixes errors and explains them with GPT-3 by jsonathan
Yeah I imagine that will be an issue for lots of people. What's the SotA in open source LLMs?
I looked it up. Apparently it's BLOOM. Slightly bigger than GPT-3. No idea if it is better.
You need a DGX A100 to run it (only $150k!).
Soundwave_47 t1_j3k9npf wrote
Anecdotally, it is comparable.
LetterRip t1_j3n91mt wrote
I'd do GLM-130B
> With INT4 quantization, the hardware requirements can further be reduced to a single server with 4 * RTX 3090 (24G) with almost no performance degradation.
https://github.com/THUDM/GLM-130B
I'd also look into pruning/distillation and you could probably shrink the model by about half again.
Viewing a single comment thread. View all comments