Gatensio t1_jd6rixk wrote on March 22, 2023 at 6:27 AM

Reply to comment by KerfuffleV2 in [D] Running an LLM on "low" compute power machines? by Qwillbehr

Doesn't 7B parameters require like 12-26GB of RAM depending on precision? How do you run the 30B?

KerfuffleV2 t1_jd7rjvf wrote on March 22, 2023 at 1:27 PM

There are quantized versions at 8bit and 4bit. The 4bit quantized 30B version is 18GB so it will run on a machine with 32GB RAM.

The bigger the model, the more tolerant it seems to quantization so even 1bit quantized models are in the realm of possibility (would probably have to be something like a 120B+ model to really work).

ambient_temp_xeno t1_jd7fm8a wrote on March 22, 2023 at 11:46 AM

I have the 7b 4bit alpaca.cpp running on my cpu (on virtualized Linux) and also this browser open with 12.3/16GB free. So realistically to use it without taking over your computer I guess 16GB of ram is needed. ~~8GB wouldn't cut it.~~ I mean, it might fit in 8gb of system ram apparently, especially if it's running natively on Linux. But I haven't tried it. I tried to load the 13b and I couldn't.

ambient_temp_xeno t1_jdcpvhv wrote on March 23, 2023 at 1:41 PM

*turns out WSL2 uses half your ram size by default. **13b seems to be weirdly not much better/possibly worse by some accounts anyway.