Viewing a single comment thread. View all comments

nullspace1729 t1_irv5z1j wrote

It’s because of something called the log-sum trick. If you combine the activation with the loss you can increase numerical stability when the logits are very close to 0 or 1.

7