Submitted by jayalammar t3_xvje2n in MachineLearning
new_name_who_dis_ t1_ir2f1oy wrote
When you say that OpenClip can potentially replace the CLIP model, the rest doesn't need to be retrained does it? Is the CLIP model trained jointly with the diffusion Unet and autoencoder?
jayalammar OP t1_ir2im9w wrote
New Stable Diffusion models have to be trained to utilize the OpenCLIP model. That's because many components in the attention/resnet layer are trained to deal with the representations learned by CLIP. Swapping it out for OpenCLIP would be disruptive.
In that training process, however, OpenCLIP can be frozen just like how CLIP was frozen in the training of Stable Diffusion / LDM.
Viewing a single comment thread. View all comments