I am working on a unet model that takes as input 64x64 landsat imagery and outputs various classes of agricultural features. The training works ok when I scale the surface reflectance (SR) values to 0-1 (i.e. divide raw SR by the 16bit max constant 65536). What I've noticed is that the model seems to be memorizing the range of values in each image and not learning the shapes and spatial patterns as much. The result is that predictions vary a bit too much from year to year and years not appearing in the training dataset have suboptimal predictions. Batch normalization does not seem to change anything. Model converges faster but the problem remains.

What I've tried to do is normalize each image individually by subtracting each channel by its mean and dividing by its standard deviation. This maintains the relative spatial patterns and shapes but bring all images to a mean of 0 and standard deviation of 1. Feeding these normalized images to the model does not work. I get precision and recall of 0. Pretty much all predictions were 0. Is there a reason why this would happen? Am I missing something about the way unet works? Any insight would be appreciated.

Update: This may be useful to others. I was able to resolve the problem. Apparently, having the each image channel averaging to 0 does not allow the model to train properly and converge. What I've done is randomly shift the mean of each image and adding some random jitter. The model can then train properly. Thanks everyone for your insights. Next, I will evaluate the new model for accuracy of year over year predictions.

Comments

You must log in or register to comment.

newperson77777777 t1_j1qv2cz wrote on December 26, 2022 at 5:59 PM

You may want to check for bugs as well with your normalization process. Additionally, generally, if you're doing standard normalization for images, you do it over the training set rather than per image. So you get the mean and standard deviation over your training set and calculate the normalized images. However, you seem to suggest that there are some differences in the images depending on the year and other qualities. Thus, you may want to standard normalize by year or something else.

I'm assuming this is a segmentation problem? How do you know the model is memorizing the input range and not learning the spatial patterns? If there's a memorization issue, the model may be overfitting - thus, you may want to consider using data augmentation or more data or try simpler models.

skn133229 OP t1_j1r2mfy wrote on December 26, 2022 at 6:53 PM

Yes this is a segmentation problem. I have a preprocessing augmentation function that randomly rotate, flip and translate the images. My network also starts with a gaussiannoise layer which adds some random noise to the input before entering the unet network. I thought it was memorizing input ranges because of the poor performance when images are normalized individually. When I peak at the normalized input images I see the salient patterns I was hoping that the unet would latch onto but training fails. I can rule out bug in the normalization process because I can visualize the normalized input images and they look fine.

newperson77777777 t1_j1r9oni wrote on December 26, 2022 at 7:45 PM

So based on my understanding, training fails when you try to normalize per image, right? When you don't normalize per-image, training is fine but you just get suboptimal validation performance - which, in this case, you are saying is because there may be more error during certain years, especially years that are not considered during training. Is that an accurate summary?

If this is the case, I would try to test two things independently (not at the same time): 1. standard normalization over the entire dataset and 2. standard normalization per year. I would also explore the data based on year year and try to see what are the differences to see how you can adjust your modeling. Additionally, I would try do a more thorough error analysis to understand patterns in the errors the model is making.

skn133229 OP t1_j1t9qgo wrote on December 27, 2022 at 5:30 AM

I solved the problem with failure to train (see update in original post). It appears that implementing some random spectral shifts and random noise were necessary for training to take off. I will try your suggestions to see if I get better prediction results year over year.

newperson77777777 t1_j1tbh2m wrote on December 27, 2022 at 5:49 AM

Weird, haven't encountered that issue before. If you do the feature-wise normalization that I discussed before, more likely than not that will result in, for most images, the image channel mean not being 0 as well, if that helps.

86BillionFireflies t1_j1qvoxk wrote on December 26, 2022 at 6:04 PM

This seems like a good time to use data augmentation. Anytime you think a model is using features of the data that you don't want it to use but that happen to correlate with the desired output, you get in there and make them NOT correlated.

E.g. add / subtract some random value from each image before using it for training, and/or multiply by some random factor, so that the scaling of different images is all over the place and the model can't effectively use that as a shortcut.

skn133229 OP t1_j1r2xr1 wrote on December 26, 2022 at 6:55 PM

I have a pretty elaborate augmentation process implemented. I will try to randomly shift the mean of each image to see if this helps. Not sure if the fact that all images average to 0 is now a problem.

86BillionFireflies t1_j1r4etj wrote on December 26, 2022 at 7:06 PM

Well, it's possible, and it certainly sounds like there's reason to suspect some kind of normalization related gotcha.

Maybe the scale of the values, rather than the mean being zero, is the issue?Perhaps a larger or smaller SD would change the outcome? Or a different initialization, especially if using relu. You could also try normalizing the data to have mean 0.1 and SD 1, in case it's some kind of dead relu issue. I'm really spitballing there.

crrrr30 t1_j1t7ybg wrote on December 27, 2022 at 5:12 AM

perhaps also try random gamma or something like that? contrast could be an issue.