theidiotrocketeer t1_j9e9iw9 wrote on February 21, 2023 at 7:41 AM

Is it psychotic to use a GPT based model for what could be treated as image segmentation?

For my task, I trained a GPT model to predict a mask for an Input Integer Matrix with certain rows being entirely a spurious value. Where the mask is replacing the spurious integers with X's. It is a text based model for what could be considered an image task.

activatedgeek t1_j9lobdv wrote on February 22, 2023 at 9:41 PM

It is not uncommon anymore to model images as patches of tokens, and then send in the sequence to a transformer-based model. So not psychotic at all.

See An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.