arg_max t1_j136nbo wrote on December 21, 2022 at 9:31 AM

CPU implementations are going to be very slow. I'd probably try renting an A100 VM, running some experiments, and measuring VRAM and RAM usage. But I'd be surprised if anything below a 24G 3090TI is going to do the job. The issue is that bigger than 24GB means you have to go A6000 which costs as much as 4 3090s.

arg_max t1_j136y5q wrote on December 21, 2022 at 9:35 AM

Just to give you an idea about "optimal configuration" though, this is way beyond desktop PC levels:
You will need at least 350GB GPU memory on your entire cluster to serve the OPT-175B model. For example, you can use 4 x AWS p3.16xlarge instances, which provide 4 (instance) x 8 (GPU/instance) x 16 (GB/GPU) = 512 GB memory.

https://alpa.ai/tutorials/opt_serving.html