new_name_who_dis_ t1_j4v5bet wrote on January 18, 2023 at 2:06 PM

Architecturally probably some form of unet is best. It’s the architecture of choice for things like segmentation so I imagine it would be good for IR as well

kingdroopa OP t1_j4v5o38 wrote on January 18, 2023 at 2:09 PM

Could you recommend any SOTA models using U-NET?

Anjum48 t1_j4v8mpm wrote on January 18, 2023 at 2:30 PM

+1 for UNets. Since IR will be a single channel you could use a single class semantic segmentation-type model (i.e. a UNet with a 1-channel output passed through a sigmoid). Something like this would get you started:

model = sm.Unet('resnet34', classes=1, activation='sigmoid')

Edit: Forgot the link for the package I'm referencing: https://github.com/qubvel/segmentation_models

Many of the most popular encoders/backbones are implemented in that package

Edit 2: Is the FOV important? If you could resize the images so that the RGB & IR FOV are equivalent then that would make things a lot simpler

kingdroopa OP t1_j4vafrc wrote on January 18, 2023 at 2:43 PM

Thanks a lot! Will look into it, but seems like the U-NET outputs are segmentation masks, whilst I want it to actually output (generate) IR image equivalents of the RGB image. Is there some idea that I'm missing, perhaps?

Anjum48 t1_j4vc9kp wrote on January 18, 2023 at 2:56 PM

The Unet I described will output a continuous number for each pixel between 0 & 1, which you can use as a proxy for your IR image.

People often use a threshold to this image (e.g. 0.5) to create a mask which might be where you are getting confused

kingdroopa OP t1_j4vh0sq wrote on January 18, 2023 at 3:28 PM

Ahh, I see. Thanks! I'll write it down in my TODO list. Might have to investigate seg masks a bit more :)

ML4Bratwurst t1_j4vfax7 wrote on January 18, 2023 at 3:16 PM

I think one important part here is the "misalignment" of the images. Have you tried to cut and resize the images, so that they show the same region? You don't need a GAN then

kingdroopa OP t1_j4vguxt wrote on January 18, 2023 at 3:26 PM

The GAN models I've tested are based on the 'unaligned' approach (e.g. CycleGAN). I still have not tested to cut and resize the images, to make them show the same region. My immediate thought would be that the top-and-bottom of both images might dissapear, but perhaps its ok still?

tdgros t1_j4vipol wrote on January 18, 2023 at 3:39 PM

if the two cameras are rigidly fixed, then you can calibrate them like one calibrates a stereo pair, and at least align the orientation and intrinsics. The points very far from the camera will be well aligned, the ones very close will remain unaligned.

The calibration process will involve you pointing positions by hand, but the maths for the correction is very very simple after that.

Latter_Security9389 t1_j4v5b5e wrote on January 18, 2023 at 2:06 PM

Have you already tried different variants of GAN for more stable training?

kingdroopa OP t1_j4v5qug wrote on January 18, 2023 at 2:09 PM

Have tried CycleGAN, CUT (which is an improvement of CycleGAN), NEGCUT (similar to CUT) and ACL-GAN.

BlazeObsidian t1_j4v495i wrote on January 18, 2023 at 1:58 PM

~~Autoencoders like VAE’s should work better than any other models for image to image translation. Maybe you can try different VAE models and compare their performance~~

I was wrong.

kingdroopa OP t1_j4v5t9a wrote on January 18, 2023 at 2:10 PM

Hmm, interesting! Do you have any papers/article/sources supporting this claim?

BlazeObsidian t1_j4var74 wrote on January 18, 2023 at 2:45 PM

Sorry, I was wrong. Modern deep VAE's can match SOTA GAN model performance for img superresolution(https://arxiv.org/abs/2203.09445) but I don't have evidence for recoloring.

But diffusion models are shown to outperform GAN's on multiple img-to-img translation tasks. Eg:- https://deepai.org/publication/palette-image-to-image-diffusion-models

You could probably reframe your problem as an image colorization task:- https://paperswithcode.com/task/colorization and the SOTA is still Palette linked above

kingdroopa OP t1_j4vbaxk wrote on January 18, 2023 at 2:49 PM

Thanks :) I noticed Palette uses paired images, whilst mine are a bit unaligned. Would you considered it a paired image set, or unpaired? They look closely similar, but don't share pixel information in the top/bottom of the images.

BlazeObsidian t1_j4vc61q wrote on January 18, 2023 at 2:55 PM

That depends on the extent to which the pixel information is misaligned I think. If cropping your images is not a solution and a large portion of your images have this issue, the model wouldn't be able to generate the right pixel information for the misaligned sections. But it's worth giving a try with Palette if the misalignment is not significant.

ML4Bratwurst t1_j4vfm36 wrote on January 18, 2023 at 3:18 PM

Maybe you could also turn the RGB image into grayscale and use it as an additional supervised loss for regularization and maybe more stable training.

kingdroopa OP t1_j4vgmvb wrote on January 18, 2023 at 3:25 PM

Interesting! I will for sure write that down in my TODO list, thanks!

nmkd t1_j4vg72g wrote on January 18, 2023 at 3:22 PM

You cannot just translate visible light to IR. No matter what machine learning you use, this is physically impossible.

kingdroopa OP t1_j4vgho9 wrote on January 18, 2023 at 3:24 PM

Correct, it's not physically possible. This is a research project to find to what degree it IS possible :)

nmkd t1_j4vh7cl wrote on January 18, 2023 at 3:29 PM

Okay, in that case, I'll try to be a bit more helpful lol.

I think you absolutely need to use something like YOLO for object identification/classification.

Humans and animals are warmer than the environment
Cars and other vehicles are warmer than the environment
Glass blocks IR but not visible light

You could get the overall "look" with just image-based networks, but to make it really convincing (more like COD's thermal vision) you need classification in order to make objects look hot that are supposed to be hot.

[D] Suggestion for approaching img-to-img?

Comments