Viewing a single comment thread. View all comments

tdgros t1_jdqbgqy wrote

the model merging offered by some stable diffusion UIs do not merge the weights of a network! They merge the denoising results for a single diffusion step from 2 different denoisers, this is very different!

Merging the weights of two different models does not produce something functional in general, it also can only work for 2 models with exactly the same structure. It certainly does not "mix their functionality".

9

Co0k1eGal3xy t1_jdqfxcr wrote

  1. Most stable diffusion UIs DO merge weights by averaging them
  2. Averaging weights between checkpoints works really well with CLIP fine-tuning, improving performance over both checkpoints for their respective validation sets. https://github.com/mlfoundations/wise-ft
  3. Git-rebasin found that their method of merging weights works for merging checkpoints with completely different pretraining data + init weights and improves accuracy on a mixed validation set over just using one model or the other. https://arxiv.org/abs/2209.04836

You're right that merging the model outputs has higher quality than merging the weights, but OP was asking if it was possible and it is very much possible if the weight tensors have the same shape.

12

tdgros t1_jdqjc8q wrote

there's also weight averaging in eSRGAN that I knew about, but that always irked me. The permutation argument from your third point is the usual reason I evoke on this subject, and the paper does show why it's not as simple as just blending weights! The same reasoning also shows why blending subsequent checkpoints isn't like blending random networks.

2

_Arsenie_Boca_ t1_jdqy1n8 wrote

Merging model outputs also means you have to run both models. I think the best option is to merge the weights and recover performance using datasets from both domains and distillation from the respective expert model.

2