new_name_who_dis_ t1_ir2f1oy wrote on October 4, 2022 at 8:52 PM

When you say that OpenClip can potentially replace the CLIP model, the rest doesn't need to be retrained does it? Is the CLIP model trained jointly with the diffusion Unet and autoencoder?

jayalammar OP t1_ir2im9w wrote on October 4, 2022 at 9:14 PM

New Stable Diffusion models have to be trained to utilize the OpenCLIP model. That's because many components in the attention/resnet layer are trained to deal with the representations learned by CLIP. Swapping it out for OpenCLIP would be disruptive.

In that training process, however, OpenCLIP can be frozen just like how CLIP was frozen in the training of Stable Diffusion / LDM.