bremen79

bremen79 t1_j97sb9r wrote

The sigmoid will make effectively very hard for the network to produce values close to 1, because it would require a pre activation value close to infinity. Would this be a good behavior in your application?

4

bremen79 t1_j0n8wsn wrote

Platt scaling does not have any guarantee and in fact it is easy to construct examples where it fails. On the other hand, conformal prediction methods, under very weak assumptions, on the multiclass problem of the question would give you a set of labels that is guaranteed to contain the true label with a specified probability.

3