Viewing a single comment thread. View all comments

recidivistic_shitped t1_j136lsh wrote on December 21, 2022 at 9:30 AM

GPT-J-6B can load under 8GB vram with Int8.LLM. For this same reason, you can also run it in Colab nowadays.

175B.... Really bad idea to offload it to normal RAM. Inference is more limited by FLOPS than memory at that scale. OpenAI's API is cheap enough unless you're scaling to a substantial userbase.