thundergolfer t1_ix9zysc wrote on November 21, 2022 at 9:47 PM

> How can I deploy it so that its scalable ?

There's no such general thing as "scalability" (AKA magic scaling sauce). You'll have to be a lot more specific about how your deployment is not handling changes in load parameters.

If I had to guess, I'd say the likely scaling issue is going from a single VM with a single GPU to N GPUs able to run inference in parallel.

If that is your main scaling issue, modal.com can do serverless GPU training/inference against N GPUs almost trivially: twitter.com/charles_irl/status/1594732453809340416.

(disclaimer: work for modal)

Dense_History_1786 OP t1_ixa3ha2 wrote on November 21, 2022 at 10:12 PM

sorry, should have been more clear.
but you are right, I have a single vm and thats the problem, I will checkout modal, thanks.

thundergolfer t1_ixalc2h wrote on November 22, 2022 at 12:27 AM

If doesn't suit, lmk what didn't work well. Otherwise, I think other serverless GPU platforms will be your best bet. I don't think GCP do serverless GPUs and although AWS Sagemaker supports it their UX makes development a big pain.