Submitted by Dense_History_1786 t3_z18xz0 in MachineLearning
thundergolfer t1_ix9zysc wrote
> How can I deploy it so that its scalable ?
There's no such general thing as "scalability" (AKA magic scaling sauce). You'll have to be a lot more specific about how your deployment is not handling changes in load parameters.
If I had to guess, I'd say the likely scaling issue is going from a single VM with a single GPU to N
GPUs able to run inference in parallel.
If that is your main scaling issue, modal.com can do serverless GPU training/inference against N
GPUs almost trivially: twitter.com/charles_irl/status/1594732453809340416.
(disclaimer: work for modal)
Dense_History_1786 OP t1_ixa3ha2 wrote
sorry, should have been more clear.
but you are right, I have a single vm and thats the problem, I will checkout modal, thanks.
thundergolfer t1_ixalc2h wrote
If doesn't suit, lmk what didn't work well. Otherwise, I think other serverless GPU platforms will be your best bet. I don't think GCP do serverless GPUs and although AWS Sagemaker supports it their UX makes development a big pain.
Viewing a single comment thread. View all comments