Hi, I would like to use stable diffusion as part of a side project, I have it currently deployed on a vm in google cloud, but its not scalable. How can I deploy it so that its scalable (similar to aws lambda but with gpu)?

Comments

You must log in or register to comment.

thundergolfer t1_ix9zysc wrote on November 21, 2022 at 9:47 PM

> How can I deploy it so that its scalable ?

There's no such general thing as "scalability" (AKA magic scaling sauce). You'll have to be a lot more specific about how your deployment is not handling changes in load parameters.

If I had to guess, I'd say the likely scaling issue is going from a single VM with a single GPU to N GPUs able to run inference in parallel.

If that is your main scaling issue, modal.com can do serverless GPU training/inference against N GPUs almost trivially: twitter.com/charles_irl/status/1594732453809340416.

(disclaimer: work for modal)

Dense_History_1786 OP t1_ixa3ha2 wrote on November 21, 2022 at 10:12 PM

sorry, should have been more clear.
but you are right, I have a single vm and thats the problem, I will checkout modal, thanks.

thundergolfer t1_ixalc2h wrote on November 22, 2022 at 12:27 AM

If doesn't suit, lmk what didn't work well. Otherwise, I think other serverless GPU platforms will be your best bet. I don't think GCP do serverless GPUs and although AWS Sagemaker supports it their UX makes development a big pain.

Background_Thanks604 t1_ix9pbwu wrote on November 21, 2022 at 8:36 PM

!RemindMe 2 Days

RemindMeBot t1_ix9pep5 wrote on November 21, 2022 at 8:37 PM

I will be messaging you in 2 days on 2022-11-23 20:36:52 UTC to remind you of this link

1 OTHERS CLICKED THIS LINK to send a PM to also be reminded and to reduce spam.

^(Parent commenter can ) ^(delete this message to hide from others.)

^(Info)	^(Custom)	^(Your Reminders)	^(Feedback)

machineko t1_ixzkdbt wrote on November 27, 2022 at 4:59 PM

AWS Lambda provides serverless but you do not need serverless to make something scalable, if you are referring to scaling from single to multiple GPUs as your workload grows.

The simplest method is to containerize your application and use auto-scaling from GCP. You can also auto-scale it on Kubernetes. Alternatively, you can use services like stochastic.ai which deploys your model containerized and provides auto-scaling out of the box. You just need to upload your model and deploy.

However, I suggest you "accelerate" your inference first. For example, you can use open-source inference engines (see: https://github.com/stochasticai/x-stable-diffusion) to easily accelerate your inference 2x or more. That means you can generates 2x more images / $ on public clouds.