Submitted by Dense_History_1786 t3_z18xz0 in MachineLearning

Hi, I would like to use stable diffusion as part of a side project, I have it currently deployed on a vm in google cloud, but its not scalable. How can I deploy it so that its scalable (similar to aws lambda but with gpu)?

2

Comments

You must log in or register to comment.

thundergolfer t1_ix9zysc wrote

> How can I deploy it so that its scalable ?

There's no such general thing as "scalability" (AKA magic scaling sauce). You'll have to be a lot more specific about how your deployment is not handling changes in load parameters.

If I had to guess, I'd say the likely scaling issue is going from a single VM with a single GPU to N GPUs able to run inference in parallel.

If that is your main scaling issue, modal.com can do serverless GPU training/inference against N GPUs almost trivially: twitter.com/charles_irl/status/1594732453809340416.

(disclaimer: work for modal)

2

Dense_History_1786 OP t1_ixa3ha2 wrote

sorry, should have been more clear.
but you are right, I have a single vm and thats the problem, I will checkout modal, thanks.

1

thundergolfer t1_ixalc2h wrote

If doesn't suit, lmk what didn't work well. Otherwise, I think other serverless GPU platforms will be your best bet. I don't think GCP do serverless GPUs and although AWS Sagemaker supports it their UX makes development a big pain.

1

machineko t1_ixzkdbt wrote

AWS Lambda provides serverless but you do not need serverless to make something scalable, if you are referring to scaling from single to multiple GPUs as your workload grows.

The simplest method is to containerize your application and use auto-scaling from GCP. You can also auto-scale it on Kubernetes. Alternatively, you can use services like stochastic.ai which deploys your model containerized and provides auto-scaling out of the box. You just need to upload your model and deploy.

However, I suggest you "accelerate" your inference first. For example, you can use open-source inference engines (see: https://github.com/stochasticai/x-stable-diffusion) to easily accelerate your inference 2x or more. That means you can generates 2x more images / $ on public clouds.

1