Viewing a single comment thread. View all comments

the_new_scientist t1_j7vu5fk wrote

Yes, the DINO paper showed that the ability to perform segmentation emerges from self-supervised vision transformers.

https://arxiv.org/abs/2104.14294

Edit: oops, didn't realize you said image generation models, thought you asked for just vision models.

5