Submitted by alkibijad t3_10a6whe in MachineLearning
Are there smaller/distilled versions of CLIP? Or some other (smaller) models that connect text and images?
For my use case, the model needs to be small in size: ideally <20MB, fine < 60MB, ok < 100MB.
LetterRip t1_j43v3yi wrote
This group did such a distillation but didn't share the weights, they got it down to 24 MB.
https://www.reddit.com/r/MachineLearning/comments/p1o2bd/research_we_distilled_clip_model_vit_only_from/
LAION or stability.ai or huggingface might be willing to provide free compute to distill one of the openCLIP models.
Come to think of it, stability.ai should be releasing the distilled stablediffusion latter this month (week or two?) and it presumably will have a distilled clip.