I'm training a per-pixel image classification network, which, for each pixel in the image, predicts whether it is a sign for disease A or disease B. Note that a given pixel could be a sign for both disease A and disease B (this is a multi-label problem).

My question is: are the relative probabilities going to be calibrated? In other words, does it make sense to sort the NxNx2 probabilities, or are the probabilities for the two diseases (i.e. channels) not calibrated / comparable, since it is similar to solving two independent problems?

If it matters, I am using a ResNet, some fully-connected layers, and then a convolutional decoder.

Any thoughts will be much appreciated, thanks in advance!

Comments

pm_me_your_ensembles t1_j01xzcw wrote on December 13, 2022 at 2:23 PM

#919,840

The two are not comparable. In a multi-class single-label problem, you do K distinct projections, one for each class, but then they are combined via softmax to give you something that resembles probabilities. Since no such function is applied, it's not possible to compare the two as they don't influence each other in any way.

However, you shouldn't treat whatever a NN outputs as a probability even if it's within [0,1] as NNs are known to be overconfident.

alkaway OP t1_j01zhl7 wrote on December 13, 2022 at 2:33 PM

#919,898

Replying to pm_me_your_ensembles (#919,840)

Thanks so much for your response!

This makes sense. Are you aware of any techniques that can be used to make these probabilities comparable?

I understand that the outputs shouldn't necessarily be treated as probabilities. I simply want a relative ordering of the pixels in terms of "likelihood."

[deleted] t1_j023o61 wrote on December 13, 2022 at 3:03 PM

#920,090

Replying to alkaway (#919,898)

[deleted]

trajo123 t1_j023qfb wrote on December 13, 2022 at 3:04 PM

#920,093

Replying to alkaway (#919,898)

You could reformulate your problem to output 4 channels: "only disease A", "only disease B", "both disease A and disease B" and "no disease". This way a softmax can be applied to to these outputs, their probabilities summing to 1.

[EDIT] corrected number of classes

alkaway OP t1_j024u31 wrote on December 13, 2022 at 3:12 PM

#920,137

Replying to trajo123 (#920,093)

Thanks for your response -- This is an interesting idea! Unfortunately, I am actually training my network to predict 1000+ classes, for which such an idea would be computationally intractable...

alkaway OP t1_j02675d wrote on December 13, 2022 at 3:23 PM

#920,192

Replying to [deleted] (#920,090)

I'm not sure I understand. Are you suggesting I normalize each pixel in each NxN label-map to be mean 0 and std of 1? And then use this normalized label-map during training?

trajo123 t1_j029y2r wrote on December 13, 2022 at 3:52 PM

#920,362

Replying to alkaway (#920,137)

Ah, yes it doesn't really make sense for more than a couple of classes. So if you can't make your problem multi-class, have you tried any probability calibration on the model outputs? This should make them "more comparable", I think this is the best you can do with a deep learning model.

But why do you want to rank the outputs per pixel? Wouldn't some per-image aggregate over the channels make more sense?

Moderatecat t1_j02b4ph wrote on December 13, 2022 at 4:01 PM

#920,423

most modern deep neural nets are not well-calibrated by default. Your model output, even after normalization, can not be interpreted as probabilities unless it is well-calibrated

ResponsibilityNo7189 t1_j02dzwf wrote on December 13, 2022 at 4:22 PM

#920,529

It's an open problem to get your network probabilities to be calibrated. First you might want to read aleatoric vs. epistemic uncertainty. https://towardsdatascience.com/aleatoric-and-epistemic-uncertainty-in-deep-learning-77e5c51f9423

MonteCarlo sampling and training have been used to get a sense of uncertainty.

Also changing the Softmax temperature to get less confident outputs might "help".

pm_me_your_ensembles t1_j02eijz wrote on December 13, 2022 at 4:25 PM

#920,551

Replying to alkaway (#920,192)

Never mind my previous comment.

You could normalize both channels, ie for label 1, normalize the NxN tensor pixel, same for label 2.

bimtuckboo t1_j02jss1 wrote on December 13, 2022 at 4:59 PM

#920,775

Easiest way to find out is to make some calibration plots with your validation set. From there, depending on what the plots look like, there are some things you can do to improve the calibration post training. Look into temperature scaling and Platt scaling.

SlowFourierT198 t1_j02nin5 wrote on December 13, 2022 at 5:22 PM

#920,926

Depending on the problem you may use Bayesian Neural Networks where you fit a distribution over the weights they are better calibrated but also expensive. There exists some theory on lower cost ways to make the model better calibrated / uncertainty aware. One direction is using Gaussian Process approximations an other is for example PostNet. The overal topic you can search for is uncertainty quantification

alkaway OP t1_j02owfb wrote on December 13, 2022 at 5:31 PM

#920,999

Replying to trajo123 (#920,362)

Thanks so much for your response! Are you aware of any calibration methods I could try? Preferably ones which won't take long to implement / incorporate :P

alkaway OP t1_j02oxhi wrote on December 13, 2022 at 5:31 PM

#921,001

Replying to Moderatecat (#920,423)

Thanks so much for your response! Are you aware of any calibration methods I could try? Preferably ones which won't to long to implement / incorporate :P

alkaway OP t1_j02oy3b wrote on December 13, 2022 at 5:31 PM

#921,005

Replying to ResponsibilityNo7189 (#920,529)

Thanks so much for your response! Is temperature scaling the go-to calibration method I should try? Does temperature scaling change the relative ordering of the probabilities?

LearnDifferenceBot t1_j02p3jr wrote on December 13, 2022 at 5:32 PM

#921,015

Replying to alkaway (#920,999)

> won't to long

*too

Learn the difference here.

^(Greetings, I am a language corrector bot. To make me ignore further mistakes from you in the future, reply !optout to this comment.)

alkaway OP t1_j02phdn wrote on December 13, 2022 at 5:34 PM

#921,037

Replying to bimtuckboo (#920,775)

Thanks so much for your response! Does temperature scaling change the relative ordering of the probabilities?

alkaway OP t1_j02pj3l wrote on December 13, 2022 at 5:35 PM

#921,041

Replying to SlowFourierT198 (#920,926)

Thanks so much for your response! Will take a look.

bimtuckboo t1_j02qida wrote on December 13, 2022 at 5:41 PM

#921,086

Replying to alkaway (#921,037)

No it does not. It simply scales the probabilities to either all be closer to 0.5 or all be further from 0.5

ResponsibilityNo7189 t1_j02t093 wrote on December 13, 2022 at 5:56 PM

#921,211

Replying to alkaway (#921,005)

does note change the order. It will make the prediction less "stark", i.e. instead of .99 and 0.0001 0.002 0.007, you will get something like 0.75, 0.02, 0.04, 0.19 for instance. It is the easiest thing to do, but remember there isn't any "go-to" technique.

Red-Portal t1_j02wvcs wrote on December 13, 2022 at 6:21 PM

#921,422

With deep neural networks, I would say conformal predictions are the best way to get uncertainty estimates.

trajo123 t1_j031wsx wrote on December 13, 2022 at 6:52 PM

#921,697

Replying to alkaway (#920,999)

Perhaps scikit-learn's "Probability calibration" section would be a good place to start. Good luck!

gosnold t1_j036xfj wrote on December 13, 2022 at 7:24 PM

#921,946

Replying to alkaway (#921,001)

Temperature adjustment in the softmax layer is quick and easy

CommunismDoesntWork t1_j03qv7i wrote on December 13, 2022 at 9:24 PM

#922,856

Why do you need probabilities? You'd be better off spending more time on making your model more accurate period, even if it can be confidently wrong sometimes.