Viewing a single comment thread. View all comments

the_new_scientist t1_j7vu5fk wrote on February 9, 2023 at 7:42 PM

Yes, the DINO paper showed that the ability to perform segmentation emerges from self-supervised vision transformers.

Edit: oops, didn't realize you said image generation models, thought you asked for just vision models.

irulenot t1_j7xxq3z wrote on February 10, 2023 at 4:40 AM

Yes this!! Sorry didn’t see it