Submitted by Zondartul t3_zrbfcr in MachineLearning
I'm trying to figure out how to go about running something like GPT-J, FLAN-T5, etc, on my PC, without using cloud compute services (because privacy and other reasons). However, GPT-J-6B needs either ~14 GB of VRAM or 4x as much plain RAM.
Upgrading my PC for 48 GB of RAM is possible, and 16, 24 GB graphics cards are available for general public (though they cost as much as a car), but anything beyond that is in the realm of HPC, datacenter hardware and "GPU accelerators"... I.e. 128 GB GPUs exist out there somewhere, but the distributors don't even list a price, it's just "get a quote" and "contact us"... meaning it's super expensive and you need to be a CEO of medium-sized company for them to even talk to you?
I'm trying to figure out if it's possible to run the larger models (e.g. 175B GPT-3 equivalents) on consumer hardware, perhaps by doing a very slow emulation using one or several PCs such that their collective RAM (or swap SDD space) matches the VRAM needed for those beasts.
So the question is "will it run super slowly" or "will it fail immediately due to completely incompatible software / being impossible to configure for anything other than real datacenter hardware"?
CKtalon t1_j13dg5b wrote
Just forget about it.
Yes, it's possible to do it on CPU/RAM (Threadripper builds with > 256GB RAM + some assortment of 2x-4x GPUs), but the speed is so slow that it's pointless working with it. Deepspeed or Hugging Face can spread it out between GPU and CPU, but even so, it will be stupid slow, probably MINUTES per token.
We are at least 5 years away before consumer hardware can run 175+B models on a single machine (4 GPUs in a single machine).
20B models are in the realm of consumer hardware (3090/4090) with INT8, though slow, but still possible.