Submitted by Zondartul t3_zrbfcr in MachineLearning
recidivistic_shitped t1_j136lsh wrote
GPT-J-6B can load under 8GB vram with Int8.LLM. For this same reason, you can also run it in Colab nowadays.
175B.... Really bad idea to offload it to normal RAM. Inference is more limited by FLOPS than memory at that scale. OpenAI's API is cheap enough unless you're scaling to a substantial userbase.
Viewing a single comment thread. View all comments