[D] Classification with final layer having no activation? Submitted by AbIgnorantesBurros t3_y0y3q6 on October 11, 2022 at 3:11 AM in MachineLearning 7 comments 6
nullspace1729 t1_irv5z1j wrote on October 11, 2022 at 7:57 AM It’s because of something called the log-sum trick. If you combine the activation with the loss you can increase numerical stability when the logits are very close to 0 or 1. Permalink 7
Viewing a single comment thread. View all comments