Submitted by Alex-S-S t3_zh69o0 in MachineLearning

Is there a way to make a neural network that performs only regression to estimate its own error at inference time, without having ground truths for reference?

My network predicts N points and I know the [x,y] coordinates for each. On a labeled test set I can compute the distance between each point and the ground truths, however, I want the network to be able to estimate these distances by itself.

I do not have separate classes, the network is trained using just the L2 loss between its predicted points and the expected ground truth points.

24

Comments

You must log in or register to comment.

Fancy_Traffic7435 t1_izkikl5 wrote

Can you? Sure. Should you? Not sure about that.

If you think of neural networks as underlying probability distributions, then estimating the error of your regression as single values can seem misleading, as the true error actually lies within a distribution of predicted errors. Going off of that, it would be better to analyze the distribution of your errors in relation to your predictions.

16

Fancy_Traffic7435 t1_izkksyd wrote

If you have the assumption that your test data and forecasting data are from the same distributions, which they should be unless data drift is occurring, then you can create the distribution of your errors when your model has been trained by looking at the errors for your test set. From there you could simply say, "This is my prediction and this is the general distribution of error, so therefore this is the distribution of my potential values," or you could take it a step further and look at the errors associated with the test data that are most similar.

There may be other techniques that work better, so further reading may help.

4

Unlikely-Video-663 t1_izktnhx wrote

You might be able to recast the problem to assume the labels are acutally drawn from some distribution and put some simple liklihood function over it -- then learn the parameters of that distribution. This not theoretically sound, you wont capture any epistemic uncertainty, but most of the aleotoric, so dependend on your usecase, it might work.

In practice, use for example a Gaussian likelihood, and learn with a GaussianNLL Loss also the variance. As long as your samples stay within the same distribution, yadaya, this can work okish ..

Otherwise, there are plenty of recalibration techniques to get better results

2

Equivalent-Way3 t1_izkuzgt wrote

Perhaps a Bayesian neural net would be what you're looking for

6

mgostIH t1_izkzqdm wrote

The paper Epistemic Neural Networks does this formally and efficiently. Much more than Bayesian networks, at the cost of slightly more than your standard forward pass.

10

WigglyHypersurface t1_izl6mi2 wrote

If you don't want to go full Bayesian there's always the good old bootstrap. Retrain the model as many times as possible over N replicates of your original data sampled with replacement, then take the variance of your errors over the N errors.

7

Phoneaccount25732 t1_izl9pk8 wrote

Monte Carlo Dropout during the forward pass can be used for variance estimation.

8

chrysanthemum6359 t1_izlfj5b wrote

Predict the distribution of the residuals. This will involve modelling a contribution from noise within the data and a contribution from uncertainty in the neural network weights. You can model the former with negative log-likelihood loss. And you can model the latter either by training multiple of the same model and taking the average and standard deviation of their predictions, or by using Monte Carlo dropout.

For more info, read this nice paper. This one is good too, and there's examples of code as well.

2

PandaMomentum t1_izlhb57 wrote

I feel like you're trying to re-invent regression residuals analysis in a NN setting? Where you're comparing your data points to the predicted values or class? There are a lot of tools on the regression diagnostics side, most of which are looking for things that don't really matter in an arbitrary non-linear curve fitting process like NN. So depends on what your need is for the error analysis.

1

AdelSexy t1_izljk4l wrote

My dropout for uncertainty coming from the model. Learning mean together with variance for uncertainty coming from the data. Assuming your target values are sampled from Normal distribution .

1

LimitedConsequence t1_izllezz wrote

The network is already doing its best at minimising the distance. If your final goal is point estimates that minimise the distance, predicting the error is probably not a good way to go about improving performance.

However, if you care about the uncertainty / having a distribution over where the ground truth might be, then there are definitely various techniques that allow this.

For example, if you expect the errors to change depending on some conditioning variable, you could have the neural network output the locations (mean), and the standard deviations (uncertainty) of the positions, given the conditioning variables. In practice you would output log stds and exponentiate to ensure positivity. Then you could use a Gaussian likelihood function approximation, replacing the L2 norm with the negative log likelihood under the Gaussian assumption.

2

Celmeno t1_izlxq9o wrote

What you want is a bayesian estimator. It gives you a probability distribution over all possible regression values (where the mode/expected value is the equivalent of the point estimate you are used to). The smaller the distribution the higher its estimated accuracy. You basicly get the value and its expected error all in one. No problem coding this into a neural network.

1

cruddybanana1102 t1_iznado4 wrote

You should check out what's known in the field as "uncertainty-aware learning". Definitely not the same as "getting NNs to estimate their own uncertainty" but certainly helpful for what you're wanting to do

1

Apfelbrecher t1_izt36zl wrote

Mabye i dont quiet get the question. I am pretty new to ML. But doesnt it sound a little bit like neural processes? You get a uncertainty feedback by splitting your data into subsets and training several NN Models, thus getting some kind of distribution over possible functions.

1