Submitted by thomasahle t3_118gie9 in MachineLearning
activatedgeek t1_j9lux6q wrote
You are implying that the NN learns exp(logits)
instead of the logits
without really constraining the outputs to be positive. It probably won't be a proper scoring rule though might appear to work.
In some ways, this is similar to how you can learn classifiers with the mean squared error by regressing directly to the one-hot vector of class label (here also you don't care about positive output). It works, and also implies a proper scoring rule called the Brier score.
thomasahle OP t1_j9nprmt wrote
Great example! With Brier scoring we have
loss = norm(x)**2 - x[label]**2 + (1-x[label])**2
= norm(x)**2 - 2*x[label] + 1
which is basically equivalent to replacing logsumexp
with norm^2
in the first code
def label_cross_entropy_on_logits(x, labels):
return (-2*x.select(labels) + x.norm(axis=1)**2).sum(axis=0)
This actually works just as good as my original method! The Wikipedia article for proper scoring functions also mention "Spherical score", which seems to be equivalent to my method of dividing by the norm. So maybe that's the explanation?
Note though that I applied Brier Loss directly on the logits, which is probably not how they are meant to be used...
Viewing a single comment thread. View all comments