Flag_Red t1_iw1lntd wrote
It's mentioned a few times in the articles/readme for this tool that it enables fine tuning on consumer hardware. Are there any examples of doing something like this? How long of fine tuning on a 3080 (or something) does it take teach the model a new concept? What sort of dataset is needed? Comparison to something like DreamBooth?
I'd love to try fine tuning on some of the datasets I have lying around, but I'm not sure where to start, or even if it's really viable on consumer tech.
enryu42 t1_iw2m1nt wrote
Even without any optimizations, it is possible to fine-tune StableDiffusion on RTX 3090, even in fp32, with some effort - even with batch size 2 (precomputing latent embeddings, saving some VRAM by not storing the autoencoder params during training).
But this is definitely not a "one-button" solution, and requires more effort than using the existing tools like textual inversion/DreamBooth (which are more appropriate for the "teach the model a new concept" use-case).
Flag_Red t1_iw2nxte wrote
If I'm not mistaken, full fine tuning on one 3090 isn't really feasible because of training times. I haven't tried it, but I was under the impression that matching the results of a DreamBooth would take an unreasonably long time.
DreamBooth gets around this by bootstrapping a very small number of training examples to learn a single concept. But if I have a few thousand well labelled images, I should be able to do a fine tune on them (maybe with some regularisation?) and get better results.
enryu42 t1_iw2vlwf wrote
Oh, it is totally feasible - I'm getting smth around 2.5 training examples/second with vanilla SD without any optimizations (which translates to more than 200k per day), which is more than enough for fine-tuning.
I'd still not recommend it for teaching the model new concepts though - it is more appropriate for transferring the model to new domains (e.g. here people adapted it to anime images).
Viewing a single comment thread. View all comments